MMLU#20 of 28
acc
View run →57.3%±0.4%
Evaluated across 13 benchmarks. Ranks in the top 3 on 0 of 13. Strongest showing on MMLU (57.3% acc, #20 of 28). Weakest scores tied at 0.2% on MBPP, and MBPP(Instruct).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| MMLU | acc | 57.3%±0.4% | #20 of 28 | View run → |
| MGSM | exact_match | 52.8%±0.8% | #22 of 28 | View run → |
| EQ-Bench | eqbench | 48.7±3.0 | #25 of 28 | View run → |
| IFEval | prompt_level_strict_acc | 41.8%±2.1% | #25 of 28 | View run → |
| GPQA Diamond | acc | 25.8%±3.1% | #26 of 28 | View run → |
| GPQA Main | acc | 25.4%±2.1% | #24 of 28 | View run → |
| GPQA Extended | acc | 24.9%±1.9% | #24 of 28 | View run → |
| LongBench | aggregate | 14.8%±0.3% | #10 of 17 | View run → |
| MMLU-Pro | exact_match | 7.3%±0.2% | #26 of 28 | View run → |
| GSM8K | exact_match | 4.3%±0.6% | #26 of 28 | View run → |
| BBH | exact_match | 2.9%±0.2% | #19 of 28 | View run → |
| MBPP | pass@1 | 0.2%±0.2% | #23 of 28 | View run → |
| MBPP(Instruct) | pass@1 | 0.2%±0.2% | #5 of 24 | View run → |
Citation
FrozeBench. "microsoft/phi-4-mini-reasoning." https://frozebench.com/models/microsoft-phi-4-mini-reasoning. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_microsoft_phi_4_mini_reasoning,
title = {microsoft/phi-4-mini-reasoning},
howpublished = {\url{https://frozebench.com/models/microsoft-phi-4-mini-reasoning}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/microsoft-phi-4-mini-reasoning