Glossary

Plain-English definitions for open-LLM jargon.

A model whose trained parameter values (weights) are publicly downloadable. Open weights does NOT mean open training data or open code — it just means you can run, fine-tune, and distribute the model yourself. Llama 4, DeepSeek V4, and Qwen3.6 are all open-weights but not fully open-source.

Mixture of Experts (MoE)

An architecture where the model has many specialist sub-networks (experts) but only routes each token to a few of them. A 685B-parameter MoE might only activate 22B params per forward pass. Result: training/inference cost like a small dense model, knowledge capacity like a giant one. DeepSeek V4 and Mixtral 8x22B are MoE models.

Quantization

Compressing a model by storing weights in lower-precision numbers (e.g. 4-bit instead of 16-bit). A 70B model in fp16 needs 140GB; the same model in Q4 needs ~35GB. Modern quantization methods (Q4_K_M, AWQ, GPTQ) lose <2% quality on most benchmarks. Quantization is what lets large LLMs run on consumer hardware.

Context Window

The maximum number of tokens (input + output) a model can handle in a single inference. A 128K context model can process roughly 96K words at once. Longer context = ability to read whole books, codebases, or long conversations, but memory usage grows quadratically with context length. InternLM 3 has the longest open-model context at 200K.

HumanEval

A benchmark of 164 hand-written Python programming problems, each with unit tests. Score = % of problems where the model's generated code passes the tests. The de-facto benchmark for code-generation quality. DeepSeek V4 (92.1) and DeepSeek Coder V3 (89.4) currently top open models.

MMLU

Massive Multitask Language Understanding — 57 subjects ranging from US history to college math. The standard general-knowledge benchmark for LLMs. Top open models score 84-89; Claude Opus 4.7 scores ~92. A high MMLU is necessary but not sufficient for being a great chat model.

LoRA / QLoRA

Low-Rank Adaptation: a fine-tuning technique that trains small adapter matrices on top of a frozen base model, instead of updating all billions of parameters. QLoRA = LoRA on a quantized base model. Together they reduce fine-tuning memory 10-50x and produce adapter files small enough to swap in seconds. The default fine-tuning approach in 2026.

Function Calling / Tool Use

The model's ability to recognize when a user query needs an external tool (calculator, web search, database query, code execution) and emit a structured JSON request to call that tool. Required for building agents. Mistral Large 3 and Command R+ 2 have native tool use; Llama 4 70B and DeepSeek V4 have it via fine-tunes.

Instruct vs Base Model

Base model: trained only on next-token prediction. Behaves like an autocomplete. Instruct (or Chat) model: same base, fine-tuned on instruction-following examples and human feedback to behave like a helpful assistant. Always pick the -Instruct or -Chat variant for chat use cases. Pick the base for fine-tuning your own assistant.

RAG (Retrieval Augmented Generation)

A pattern where the LLM gets relevant documents retrieved from a database (vector or keyword search) appended to the prompt before answering. Lets you add fresh data, internal docs, or grounding without fine-tuning. Command R+ 2 is specifically tuned for RAG; long-context models like InternLM 3 are useful for stuffing many retrieved docs into one prompt.

GGUF Format

The standard file format for quantized LLMs used by llama.cpp, Ollama, LM Studio, and most local inference tools. A .gguf file contains the weights, tokenizer, and metadata in one cross-platform binary. The successor to the older GGML format. If you're running a local LLM in 2026, you're probably using GGUF.

Chatbot Arena

A crowd-sourced LLM evaluation where humans compare blind side-by-side responses from two models and vote which is better. Produces an Elo rating per model. Currently the most-cited 'real-world quality' ranking because it captures preferences benchmarks miss. As of April 2026, Claude Opus 4.7 leads, with DeepSeek V4 the top open-weights model at 1342.