openai/gpt-oss-20b

Weights: 13.8 GB · safetensors
Source: huggingface page
Commit: master

Evaluated across 13 benchmarks. Ranks in the top 3 on 1 of 13. Strongest showing on EQ-Bench (73.9 eqbench, #20 of 28). Weakest on MBPP (0.0% pass@1, #25 of 28).

Benchmark results

Benchmark	Metric	Score	Rank	Actions
EQ-Bench	eqbench	73.9±1.9	#20 of 28	View run →
IFEval	prompt_level_strict_acc	60.4%±2.1%	#24 of 28	View run →
MGSM	exact_match	58.3%±0.8%	#19 of 28	View run →
MMLU	acc	56.4%±0.4%	#21 of 28	View run →
MBPP(Instruct)	pass@1	52.8%±2.2%	#3 of 24	View run →
GSM8K	exact_match	39.3%±1.3%	#21 of 28	View run →
MMLU-Pro	exact_match	32.6%±0.4%	#23 of 28	View run →
GPQA Extended	acc	27.7%±1.9%	#15 of 28	View run →
GPQA Main	acc	26.8%±2.1%	#17 of 28	View run →
GPQA Diamond	acc	26.8%±3.2%	#23 of 28	View run →
LongBench	aggregate	13.5%±0.3%	#13 of 17	View run →
BBH	exact_match	1.5%±0.1%	#22 of 28	View run →
MBPP	pass@1	0.0%±0.0%	#25 of 28	View run →

EQ-Bench#20 of 28
eqbench
73.9±1.9
View run →
IFEval#24 of 28
prompt_level_strict_acc
60.4%±2.1%
View run →
MGSM#19 of 28
exact_match
58.3%±0.8%
View run →
MMLU#21 of 28
acc
56.4%±0.4%
View run →
MBPP(Instruct)#3 of 24
pass@1
52.8%±2.2%
View run →
GSM8K#21 of 28
exact_match
39.3%±1.3%
View run →
MMLU-Pro#23 of 28
exact_match
32.6%±0.4%
View run →
GPQA Extended#15 of 28
acc
27.7%±1.9%
View run →
GPQA Main#17 of 28
acc
26.8%±2.1%
View run →
GPQA Diamond#23 of 28
acc
26.8%±3.2%
View run →
LongBench#13 of 17
aggregate
13.5%±0.3%
View run →
BBH#22 of 28
exact_match
1.5%±0.1%
View run →
MBPP#25 of 28
pass@1
0.0%±0.0%
View run →

last evaluated:1 months ago

How to cite

Citation

FrozeBench. "openai/gpt-oss-20b." https://frozebench.com/models/openai-gpt-oss-20b. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_openai_gpt_oss_20b,
  title = {openai/gpt-oss-20b},
  howpublished = {\url{https://frozebench.com/models/openai-gpt-oss-20b}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/models/openai-gpt-oss-20b