MMLU

Massive Multitask Language Understanding — 57 subjects ranging from US history to college math. The standard general-knowledge benchmark for LLMs. Top open models score 84-89; Claude Opus 4.7 scores ~92. A high MMLU is necessary but not sufficient for being a great chat model.