EQ-Bench#20 of 28
eqbench
View run →73.9±1.9
Evaluated across 13 benchmarks. Ranks in the top 3 on 1 of 13. Strongest showing on EQ-Bench (73.9 eqbench, #20 of 28). Weakest on MBPP (0.0% pass@1, #25 of 28).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| EQ-Bench | eqbench | 73.9±1.9 | #20 of 28 | View run → |
| IFEval | prompt_level_strict_acc | 60.4%±2.1% | #24 of 28 | View run → |
| MGSM | exact_match | 58.3%±0.8% | #19 of 28 | View run → |
| MMLU | acc | 56.4%±0.4% | #21 of 28 | View run → |
| MBPP(Instruct) | pass@1 | 52.8%±2.2% | #3 of 24 | View run → |
| GSM8K | exact_match | 39.3%±1.3% | #21 of 28 | View run → |
| MMLU-Pro | exact_match | 32.6%±0.4% | #23 of 28 | View run → |
| GPQA Extended | acc | 27.7%±1.9% | #15 of 28 | View run → |
| GPQA Main | acc | 26.8%±2.1% | #17 of 28 | View run → |
| GPQA Diamond | acc | 26.8%±3.2% | #23 of 28 | View run → |
| LongBench | aggregate | 13.5%±0.3% | #13 of 17 | View run → |
| BBH | exact_match | 1.5%±0.1% | #22 of 28 | View run → |
| MBPP | pass@1 | 0.0%±0.0% | #25 of 28 | View run → |
Citation
FrozeBench. "openai/gpt-oss-20b." https://frozebench.com/models/openai-gpt-oss-20b. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_openai_gpt_oss_20b,
title = {openai/gpt-oss-20b},
howpublished = {\url{https://frozebench.com/models/openai-gpt-oss-20b}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/openai-gpt-oss-20b