GSM8K#5 of 28
exact_match
View run →94.0%±0.7%
Evaluated across 12 benchmarks. Ranks in the top 3 on 0 of 12. Strongest showing on GSM8K (94.0% exact_match, #5 of 28). Weakest on LongBench (7.6% aggregate, #17 of 17).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| GSM8K | exact_match | 94.0%±0.7% | #5 of 28 | View run → |
| MMLU | acc | 80.8%±0.3% | #7 of 28 | View run → |
| EQ-Bench | eqbench | 80.3±1.9 | #8 of 28 | View run → |
| IFEval | prompt_level_strict_acc | 79.5%±1.7% | #14 of 28 | View run → |
| MMLU-Pro | exact_match | 70.0%±0.4% | #9 of 28 | View run → |
| MGSM | exact_match | 61.1%±0.8% | #16 of 28 | View run → |
| MBPP | pass@1 | 47.4%±2.2% | #15 of 28 | View run → |
| BBH | exact_match | 47.0%±0.4% | #10 of 28 | View run → |
| GPQA Diamond | acc | 29.3%±3.2% | #13 of 28 | View run → |
| GPQA Main | acc | 28.8%±2.1% | #13 of 28 | View run → |
| GPQA Extended | acc | 28.6%±1.9% | #12 of 28 | View run → |
| LongBench | aggregate | 7.6%±0.1% | #17 of 17 | View run → |
Citation
FrozeBench. "Qwen/Qwen3-32B." https://frozebench.com/models/qwen-qwen3-32b. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_Qwen_Qwen3_32B,
title = {Qwen/Qwen3-32B},
howpublished = {\url{https://frozebench.com/models/qwen-qwen3-32b}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/qwen-qwen3-32b