Skip to main content
FrozeBench

GPQA Main

gpqa_main
KnowledgeScienceReasoning

GPQA Main consists of 448 multiple-choice questions in biology, physics, and chemistry, authored by PhD-level domain experts. It is the base set of the GPQA benchmark, filtered to questions where expert validators demonstrated high accuracy while skilled non-experts with web access failed.

Source paperLatest run: 2026-05-25

Benchmark results

Switch between the canonical ranking, release-date performance view, and score-size tradeoff.

28 models

Caveats

The total set is small (N=448), so confidence intervals on per-model accuracy are wide. The questions were authored by a relatively small pool of domain experts, which introduces stylistic and difficulty biases. Contamination is a concern for models trained after late 2023.

How to cite

Citation

FrozeBench. "GPQA Main." https://frozebench.com/benchmarks/gpqa-main. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_gpqa_main,
  title = {GPQA Main},
  howpublished = {\url{https://frozebench.com/benchmarks/gpqa-main}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/benchmarks/gpqa-main