GPQA Main
GPQA Main consists of 448 multiple-choice questions in biology, physics, and chemistry, authored by PhD-level domain experts. It is the base set of the GPQA benchmark, filtered to questions where expert validators demonstrated high accuracy while skilled non-experts with web access failed.
Source paperLatest run: 2026-05-25
Benchmark results
Switch between the canonical ranking, release-date performance view, and score-size tradeoff.
Caveats
The total set is small (N=448), so confidence intervals on per-model accuracy are wide. The questions were authored by a relatively small pool of domain experts, which introduces stylistic and difficulty biases. Contamination is a concern for models trained after late 2023.
How to cite
Citation
FrozeBench. "GPQA Main." https://frozebench.com/benchmarks/gpqa-main. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_gpqa_main,
title = {GPQA Main},
howpublished = {\url{https://frozebench.com/benchmarks/gpqa-main}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/benchmarks/gpqa-main