openai/gpt-oss-120b

Weights: 65.3 GB · safetensors
Source: huggingface page
Commit: master

Evaluated across 13 benchmarks. Ranks in the top 3 on 0 of 13. Strongest showing on EQ-Bench (78.6 eqbench, #11 of 28). Weakest on BBH (0.0% exact_match, #26 of 28).

Benchmark results

Benchmark	Metric	Score	Rank	Actions
EQ-Bench	eqbench	78.6±1.6	#11 of 28	View run →
IFEval	prompt_level_strict_acc	78.2%±1.8%	#16 of 28	View run →
GSM8K	exact_match	74.5%±1.2%	#15 of 28	View run →
MGSM	exact_match	62.2%±0.8%	#15 of 28	View run →
MMLU-Pro	exact_match	62.2%±0.4%	#13 of 28	View run →
GPQA Diamond	acc	28.8%±3.2%	#15 of 28	View run →
GPQA Extended	acc	28.8%±1.9%	#11 of 28	View run →
GPQA Main	acc	26.1%±2.1%	#18 of 28	View run →
MMLU	acc	24.7%±0.4%	#28 of 28	View run →
LongBench	aggregate	14.3%±0.3%	#12 of 17	View run →
MBPP	pass@1	9.8%±1.3%	#18 of 28	View run →
MBPP(Instruct)	pass@1	2.4%±0.7%	#4 of 24	View run →
BBH	exact_match	0.0%±0.0%	#26 of 28	View run →

EQ-Bench#11 of 28
eqbench
78.6±1.6
View run →
IFEval#16 of 28
prompt_level_strict_acc
78.2%±1.8%
View run →
GSM8K#15 of 28
exact_match
74.5%±1.2%
View run →
MGSM#15 of 28
exact_match
62.2%±0.8%
View run →
MMLU-Pro#13 of 28
exact_match
62.2%±0.4%
View run →
GPQA Diamond#15 of 28
acc
28.8%±3.2%
View run →
GPQA Extended#11 of 28
acc
28.8%±1.9%
View run →
GPQA Main#18 of 28
acc
26.1%±2.1%
View run →
MMLU#28 of 28
acc
24.7%±0.4%
View run →
LongBench#12 of 17
aggregate
14.3%±0.3%
View run →
MBPP#18 of 28
pass@1
9.8%±1.3%
View run →
MBPP(Instruct)#4 of 24
pass@1
2.4%±0.7%
View run →
BBH#26 of 28
exact_match
0.0%±0.0%
View run →

last evaluated:7 months ago

How to cite

Citation

FrozeBench. "openai/gpt-oss-120b." https://frozebench.com/models/openai-gpt-oss-120b. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_openai_gpt_oss_120b,
  title = {openai/gpt-oss-120b},
  howpublished = {\url{https://frozebench.com/models/openai-gpt-oss-120b}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/models/openai-gpt-oss-120b