← Back to directory

Meta · Released 2025-09

Llama 4 8B

The 8B Llama 4 variant — the most popular local LLM by download count. Runs on a 16GB GPU at fp16, or 4GB at Q4.

LlamaCommercial with caveatssmallgeneral

Params (max)

8B

Variants

8B

Context window

128K tokens

MMLU

73

HumanEval

62.2

GSM8K

85.3

Min VRAM (fp16, smallest variant)

16GB

Smallest Q4 GGUF

~4.7GB

Languages supported

12

Pros

✓Universal deployment target
✓Long context for size
✓Huge fine-tune ecosystem

Cons

×Llama license restrictions
×Worse than Phi-4 at math/reasoning

Highlights

●Most-downloaded local LLM
●128K context
●Runs on a single laptop GPU

Where to download

Hugging Face: meta-llama/Meta-Llama-4-8B-Instruct

Or via Ollama (ollama pull llama-4-8b) or LM Studio's in-app browser.

Homepage: https://llama.meta.com

Related reading

Running an LLM on Your Laptop in 2026: M-Series, Quantization, and What Actually Works

Step-by-step: pick a quantization, install Ollama or LM Studio, run a 7B-14B model on a MacBook or 16GB GPU, and not lose your sanity.

Small LLMs on Edge Devices: What Runs on Phones, Pis, and Browsers in 2026

Gemma 2B runs on a Pi 5. Phi-4 runs in a browser via WebGPU. Phones run Llama 3B. A practical guide to LLMs on tiny hardware.

Compare Llama 4 8B with

Llama 4 8B vs Phi-4