MMLU#1 of 28
acc
View run →84.7%±0.3%
Evaluated across 13 benchmarks. Ranks in the top 3 on 1 of 13. Strongest showing on MMLU (84.7% acc, #1 of 28). Weakest on MBPP(Instruct) (0.0% pass@1, #24 of 24).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| MMLU | acc | 84.7%±0.3% | #1 of 28 | View run → |
| GSM8K | exact_match | 80.7%±1.1% | #12 of 28 | View run → |
| IFEval | prompt_level_strict_acc | 77.6%±1.8% | #17 of 28 | View run → |
| EQ-Bench | eqbench | 76.3±2.2 | #16 of 28 | View run → |
| MBPP | pass@1 | 65.8%±2.1% | #7 of 28 | View run → |
| MGSM | exact_match | 49.3%±0.8% | #24 of 28 | View run → |
| MMLU-Pro | exact_match | 44.7%±0.4% | #22 of 28 | View run → |
| GPQA Main | acc | 29.9%±2.2% | #11 of 28 | View run → |
| BBH | exact_match | 27.8%±0.4% | #14 of 28 | View run → |
| GPQA Extended | acc | 27.3%±1.9% | #16 of 28 | View run → |
| GPQA Diamond | acc | 25.3%±3.1% | #27 of 28 | View run → |
| LongBench | aggregate | 17.7%±0.3% | #9 of 17 | View run → |
| MBPP(Instruct) | pass@1 | 0.0%±0.0% | #24 of 24 | View run → |
Citation
FrozeBench. "Qwen/Qwen3-235B-A22B-Thinking-AWQ-2507." https://frozebench.com/models/qwen-qwen3-235b-a22b-thinking-awq-2507. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_Qwen_Qwen3_235B_A22B_Thinking_AWQ_2507,
title = {Qwen/Qwen3-235B-A22B-Thinking-AWQ-2507},
howpublished = {\url{https://frozebench.com/models/qwen-qwen3-235b-a22b-thinking-awq-2507}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/qwen-qwen3-235b-a22b-thinking-awq-2507