← Back to directory
Meta · Released 2025-09

Llama 4 8B

The 8B Llama 4 variant — the most popular local LLM by download count. Runs on a 16GB GPU at fp16, or 4GB at Q4.

LlamaCommercial with caveatssmallgeneral
Params (max)
8B
Variants
8B
Context window
128K tokens
MMLU
73
HumanEval
62.2
GSM8K
85.3
Min VRAM (fp16, smallest variant)
16GB
Smallest Q4 GGUF
~4.7GB
Languages supported
12
Pros
  • Universal deployment target
  • Long context for size
  • Huge fine-tune ecosystem
Cons
  • ×Llama license restrictions
  • ×Worse than Phi-4 at math/reasoning

Highlights

  • Most-downloaded local LLM
  • 128K context
  • Runs on a single laptop GPU

Where to download

Hugging Face: meta-llama/Meta-Llama-4-8B-Instruct
Or via Ollama (ollama pull llama-4-8b) or LM Studio's in-app browser.

Related reading

Compare Llama 4 8B with