IFEval#15 of 28
prompt_level_strict_acc
View run →79.3%±1.7%
Evaluated across 13 benchmarks. Ranks in the top 3 on 0 of 13. Strongest showing on IFEval (79.3% prompt_level_strict_acc, #15 of 28). Weakest scores tied at 0.0% on BBH, and MBPP(Instruct).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| IFEval | prompt_level_strict_acc | 79.3%±1.7% | #15 of 28 | View run → |
| EQ-Bench | eqbench | 74.1±1.9 | #19 of 28 | View run → |
| GSM8K | exact_match | 71.8%±1.2% | #17 of 28 | View run → |
| MMLU | acc | 68.4%±0.4% | #15 of 28 | View run → |
| MGSM | exact_match | 56.0%±0.8% | #20 of 28 | View run → |
| MMLU-Pro | exact_match | 54.3%±0.4% | #20 of 28 | View run → |
| GPQA Diamond | acc | 26.8%±3.2% | #22 of 28 | View run → |
| GPQA Main | acc | 25.2%±2.1% | #25 of 28 | View run → |
| GPQA Extended | acc | 23.1%±1.8% | #28 of 28 | View run → |
| LongBench | aggregate | 14.6%±0.2% | #11 of 17 | View run → |
| MBPP | pass@1 | 0.2%±0.2% | #24 of 28 | View run → |
| BBH | exact_match | 0.0%±0.0% | #28 of 28 | View run → |
| MBPP(Instruct) | pass@1 | 0.0%±0.0% | #6 of 24 | View run → |
Citation
FrozeBench. "Qwen/Qwen3-4B." https://frozebench.com/models/qwen-qwen3-4b. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_Qwen_Qwen3_4B,
title = {Qwen/Qwen3-4B},
howpublished = {\url{https://frozebench.com/models/qwen-qwen3-4b}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/qwen-qwen3-4b