IFEval#22 of 28
prompt_level_strict_acc
View run →69.3%±2.0%
Evaluated across 13 benchmarks. Ranks in the top 3 on 1 of 13. Strongest showing on IFEval (69.3% prompt_level_strict_acc, #22 of 28). Weakest on MBPP(Instruct) (0.0% pass@1, #19 of 24).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| IFEval | prompt_level_strict_acc | 69.3%±2.0% | #22 of 28 | View run → |
| EQ-Bench | eqbench | 67.5±2.5 | #22 of 28 | View run → |
| MMLU | acc | 66.3%±0.4% | #17 of 28 | View run → |
| MBPP | pass@1 | 54.8%±2.2% | #13 of 28 | View run → |
| MGSM | exact_match | 51.9%±0.8% | #23 of 28 | View run → |
| MMLU-Pro | exact_match | 51.2%±0.4% | #21 of 28 | View run → |
| LongBench | aggregate | 40.9%±0.5% | #2 of 17 | View run → |
| BBH | exact_match | 40.0%±0.5% | #12 of 28 | View run → |
| GSM8K | exact_match | 37.1%±1.3% | #22 of 28 | View run → |
| GPQA Diamond | acc | 33.8%±3.4% | #8 of 28 | View run → |
| GPQA Main | acc | 33.7%±2.2% | #7 of 28 | View run → |
| GPQA Extended | acc | 31.3%±2.0% | #9 of 28 | View run → |
| MBPP(Instruct) | pass@1 | 0.0%±0.0% | #19 of 24 | View run → |
Citation
FrozeBench. "microsoft/phi-4-mini-instruct." https://frozebench.com/models/microsoft-phi-4-mini-instruct. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_microsoft_phi_4_mini_instruct,
title = {microsoft/phi-4-mini-instruct},
howpublished = {\url{https://frozebench.com/models/microsoft-phi-4-mini-instruct}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/microsoft-phi-4-mini-instruct