← Glossary

Quantization

Compressing a model by storing weights in lower-precision numbers (e.g. 4-bit instead of 16-bit). A 70B model in fp16 needs 140GB; the same model in Q4 needs ~35GB. Modern quantization methods (Q4_K_M, AWQ, GPTQ) lose <2% quality on most benchmarks. Quantization is what lets large LLMs run on consumer hardware.

Related models