Fine-Tuning an Open Source LLM in 2026: LoRA vs QLoRA vs Full Fine-Tune
Should you LoRA, QLoRA, or full fine-tune your open LLM? Honest tradeoffs, GPU requirements, and a decision tree.
Fine-tuning an LLM in 2026 is much easier than 2024. Three viable approaches, each with a clear "use this when" rule.
QLoRA (use this 80% of the time) Quantize the base model to 4-bit, then train low-rank adapters on top. Memory: ~24GB for a 7B model fine-tune, ~80GB for a 70B.
Best for: domain adaptation, style transfer, format learning, "make it talk like our brand," small datasets (<100K examples).
Tools: Unsloth (fastest, my default), Axolotl, LLaMA-Factory, Hugging Face PEFT.
Quality: ~95% of full fine-tune quality at ~10% of the cost. The 5% gap rarely matters in practice.
LoRA (use when you have unquantized weights to spare) Same as QLoRA but base model stays in fp16. Memory: ~80GB for 7B, way more for 70B. Slightly better quality than QLoRA, much more expensive.
Best for: when you'll be hot-swapping many adapters into the same base model in production (LoRA adapters are tiny, ~50MB each, so you can serve hundreds of fine-tunes off one base).
Full fine-tune (use when you really need it) Train all weights of the base model. Memory: 8x H100 minimum for 70B. Cloud cost: $5-20K for a real run.
Best for: very large datasets (1M+ examples), making the model fundamentally better at something (not just adapting style/format), reasoning model training (o1-style RL).
Honestly: 95% of teams shouldn't full fine-tune. Pick a stronger base, use QLoRA, save the money.