← Back to directory
Meta · Released 2025-09
Llama 4 8B
The 8B Llama 4 variant — the most popular local LLM by download count. Runs on a 16GB GPU at fp16, or 4GB at Q4.
LlamaCommercial with caveatssmallgeneral
Params (max)
8B
Variants
8B
Context window
128K tokens
MMLU
73
HumanEval
62.2
GSM8K
85.3
Min VRAM (fp16, smallest variant)
16GB
Smallest Q4 GGUF
~4.7GB
Languages supported
12
Pros
- ✓Universal deployment target
- ✓Long context for size
- ✓Huge fine-tune ecosystem
Cons
- ×Llama license restrictions
- ×Worse than Phi-4 at math/reasoning
Highlights
- ●Most-downloaded local LLM
- ●128K context
- ●Runs on a single laptop GPU
Where to download
Hugging Face: meta-llama/Meta-Llama-4-8B-Instruct
Or via Ollama (
ollama pull llama-4-8b) or LM Studio's in-app browser.Homepage: https://llama.meta.com
Related reading
Running an LLM on Your Laptop in 2026: M-Series, Quantization, and What Actually Works
Step-by-step: pick a quantization, install Ollama or LM Studio, run a 7B-14B model on a MacBook or 16GB GPU, and not lose your sanity.
Small LLMs on Edge Devices: What Runs on Phones, Pis, and Browsers in 2026
Gemma 2B runs on a Pi 5. Phi-4 runs in a browser via WebGPU. Phones run Llama 3B. A practical guide to LLMs on tiny hardware.