2026-04-20 · fine-tuning · tutorial

Fine-Tuning an Open Source LLM in 2026: LoRA vs QLoRA vs Full Fine-Tune

Should you LoRA, QLoRA, or full fine-tune your open LLM? Honest tradeoffs, GPU requirements, and a decision tree.

Fine-tuning an LLM in 2026 is much easier than 2024. Three viable approaches, each with a clear "use this when" rule.

QLoRA (use this 80% of the time) Quantize the base model to 4-bit, then train low-rank adapters on top. Memory: ~24GB for a 7B model fine-tune, ~80GB for a 70B.

Best for: domain adaptation, style transfer, format learning, "make it talk like our brand," small datasets (<100K examples).

Tools: Unsloth (fastest, my default), Axolotl, LLaMA-Factory, Hugging Face PEFT.

Quality: ~95% of full fine-tune quality at ~10% of the cost. The 5% gap rarely matters in practice.

LoRA (use when you have unquantized weights to spare) Same as QLoRA but base model stays in fp16. Memory: ~80GB for 7B, way more for 70B. Slightly better quality than QLoRA, much more expensive.

Best for: when you'll be hot-swapping many adapters into the same base model in production (LoRA adapters are tiny, ~50MB each, so you can serve hundreds of fine-tunes off one base).

Full fine-tune (use when you really need it) Train all weights of the base model. Memory: 8x H100 minimum for 70B. Cloud cost: $5-20K for a real run.

Best for: very large datasets (1M+ examples), making the model fundamentally better at something (not just adapting style/format), reasoning model training (o1-style RL).

Honestly: 95% of teams shouldn't full fine-tune. Pick a stronger base, use QLoRA, save the money.

Practical decision tree 1. Less than 1K examples? Don't fine-tune — use few-shot prompting or RAG. 2. 1K-10K examples? QLoRA is enough. 3. 10K-100K examples and unique domain? QLoRA, but consider Tulu-style multi-turn instruction tuning. 4. 100K+ examples and real distribution shift from base? LoRA or, if budget allows, full fine-tune. 5. RL needed (RLHF, RLAIF, GRPO)? Full fine-tune territory. Use TRL or VeRL.

Which base model to fine-tune - Best documented: Llama 4 8B or 70B. Most tutorials, most reference code. - Easiest to fine-tune: Mistral 7B / 22B base. Clean architecture, well-supported in every fine-tuning library. - Most reproducible research: OLMo 2. Full training pipeline open. - Best for RAG fine-tuning: Command R+ 2 base (research only) or InternLM 3 (Apache).

Don't forget eval Fine-tunes look great on training loss and bad on real users. Build an eval set first — 200-500 prompts with ground-truth answers — and run it after every training run. Without an eval set you have no way to know if your fine-tune is better than the base.

Models in this post

Llama 4

Meta · 405B · Llama

Mistral Large 3

Mistral AI · 123B · Mistral

OLMo 2

Allen AI · 32B · Apache-2.0

Tulu 3

Allen AI · 70B · Llama