Skip to main content

LoRA / QLoRA


Fine-tune a billion-parameter model without renting a cluster.


Key Insight

This project repeats SFT with LoRA adapters, then with QLoRA — which adds 4-bit quantization — and compares quality and memory use against a full fine-tune. Instead of updating all the weights, you train a small set of extra low-rank matrices and keep the original model frozen.

Why This Matters

LoRA and QLoRA are what let you fine-tune a multi-billion-parameter model on a single consumer GPU. They turn customizing large models from a datacenter job into something anyone can do on one card.