MMLU
MMLU (Massive Multitask Language Understanding) is a broad-coverage knowledge and reasoning benchmark spanning 57 subjects across STEM, the humanities, the social sciences, law, medicine, and other professional domains. It contains 14,079 four-choice multiple-choice test items plus a 1,540-item dev/validation split used for few-shot example selection, with difficulty calibrated from elementary up to advanced-professional level. Released in 2020, it became the de facto industry standard for measuring an LLM's breadth of world knowledge and remains one of the most widely cited LLM benchmarks despite its age.
Source paperLatest run: 2026-05-26
Benchmark results
Switch between the canonical ranking, release-date performance view, and score-size tradeoff.
Caveats
Label noise is a meaningful and uneven concern. Gema et al. (2024, "Are We Done with MMLU?") manually re-annotated 5,700 questions and flagged roughly 6.5% as having errors — wrong answer keys, ambiguous options, or duplicated stems — with the rate clustering unevenly by subject (Virology had about 57% of reviewed questions flagged). This makes fine-grained subject-level comparisons across models unreliable without controlling for the label-quality floor. The benchmark is also effectively saturated: frontier models now score above 90%, leaving little discriminative headroom near the top. Public availability since 2020 means the test set is widely contaminated in modern training corpora, so high scores partially reflect memorization rather than capability. The four-choice MCQ format is gameable through option-position bias and process-of-elimination heuristics that do not require true understanding, and the benchmark is English-only with no multilingual coverage. Treat MMLU as a coarse breadth indicator rather than a discriminative measure of frontier capability.
How to cite
Citation
FrozeBench. "MMLU." https://frozebench.com/benchmarks/mmlu. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_mmlu,
title = {MMLU},
howpublished = {\url{https://frozebench.com/benchmarks/mmlu}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/benchmarks/mmlu