LoRA / QLoRA

Low-Rank Adaptation: a fine-tuning technique that trains small adapter matrices on top of a frozen base model, instead of updating all billions of parameters. QLoRA = LoRA on a quantized base model. Together they reduce fine-tuning memory 10-50x and produce adapter files small enough to swap in seconds. The default fine-tuning approach in 2026.