GSM8K#4 of 28
exact_match
View run →94.2%±0.6%
Evaluated across 13 benchmarks. Ranks in the top 3 on 2 of 13. Strongest showing on GSM8K (94.2% exact_match, #4 of 28). Weakest on MBPP(Instruct) (0.0% pass@1, #16 of 24).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| GSM8K | exact_match | 94.2%±0.6% | #4 of 28 | View run → |
| IFEval | prompt_level_strict_acc | 82.1%±1.7% | #10 of 28 | View run → |
| MBPP | pass@1 | 78.6%±1.8% | #3 of 28 | View run → |
| BBH | exact_match | 75.8%±0.4% | #3 of 28 | View run → |
| EQ-Bench | eqbench | 75.6±1.8 | #18 of 28 | View run → |
| MMLU-Pro | exact_match | 70.3%±0.4% | #8 of 28 | View run → |
| MMLU | acc | 68.9%±0.4% | #14 of 28 | View run → |
| MGSM | exact_match | 67.4%±0.8% | #11 of 28 | View run → |
| LongBench | score | 32.4%±0.4% | #8 of 10 | View run → |
| GPQA Diamond | acc | 30.8%±3.3% | #10 of 28 | View run → |
| GPQA Extended | acc | 28.9%±1.9% | #10 of 28 | View run → |
| GPQA Main | acc | 28.3%±2.1% | #16 of 28 | View run → |
| MBPP(Instruct) | pass@1 | 0.0%±0.0% | #16 of 24 | View run → |
Citation
FrozeBench. "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8." https://frozebench.com/models/qwen-qwen3-coder-30b-a3b-instruct-fp8. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_Qwen_Qwen3_Coder_30B_A3B_Instruct_FP8,
title = {Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8},
howpublished = {\url{https://frozebench.com/models/qwen-qwen3-coder-30b-a3b-instruct-fp8}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/qwen-qwen3-coder-30b-a3b-instruct-fp8