Skip to main content
FrozeBench

MMLU-Pro

mmlu_pro
KnowledgeReasoning

MMLU-Pro is a harder successor to MMLU containing 12,032 questions across 14 disciplines (Biology, Business, Chemistry, Computer Science, Economics, Engineering, Health, History, Law, Math, Philosophy, Physics, Psychology, and Other). It expands the choice set from four to ten options to suppress guessing, removes trivial and noisy items from the original MMLU pool, and curates new questions designed to require multi-step chain-of-thought reasoning rather than pure recall. Sources include MMLU itself, STEM exam banks, and TheoremQA-style problems, with expert review applied throughout.

Source paperLatest run: 2026-05-26

Benchmark results

Switch between the canonical ranking, release-date performance view, and score-size tradeoff.

28 models

Caveats

MMLU-Pro is explicitly chain-of-thought-dependent: scores collapse without CoT prompting and are not directly comparable to vanilla few-shot MMLU numbers, so any cross-benchmark comparison must control for prompting protocol. Because the question pool was assembled by filtering and augmenting several pre-existing datasets plus expert curation, provenance and difficulty are heterogeneous across disciplines, and per-discipline results should not be assumed to be calibrated against each other. Although less saturated than MMLU at release, frontier models already exceed 70-75%, narrowing the discriminative window at the top of the leaderboard. Expanding from 4 to 10 options reduces but does not eliminate MCQ-format exploitation — position bias and elimination strategies still apply. The benchmark remains English-only.

How to cite

Citation

FrozeBench. "MMLU-Pro." https://frozebench.com/benchmarks/mmlu-pro. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_mmlu_pro,
  title = {MMLU-Pro},
  howpublished = {\url{https://frozebench.com/benchmarks/mmlu-pro}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/benchmarks/mmlu-pro