EQ-Bench#14 of 28
eqbench
View run →77.6±1.6
Evaluated across 12 benchmarks. Ranks in the top 3 on 0 of 12. Strongest showing on EQ-Bench (77.6 eqbench, #14 of 28). Weakest on BBH (0.4% exact_match, #23 of 28).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| EQ-Bench | eqbench | 77.6±1.6 | #14 of 28 | View run → |
| MMLU | acc | 76.3%±0.3% | #10 of 28 | View run → |
| MBPP | pass@1 | 72.2%±2.0% | #4 of 28 | View run → |
| MMLU-Pro | exact_match | 71.8%±0.4% | #7 of 28 | View run → |
| IFEval | prompt_level_strict_acc | 62.7%±2.1% | #23 of 28 | View run → |
| GPQA Diamond | acc | 35.9%±3.4% | #6 of 28 | View run → |
| GPQA Extended | acc | 34.2%±2.0% | #6 of 28 | View run → |
| GPQA Main | acc | 33.5%±2.2% | #8 of 28 | View run → |
| LongBench | aggregate | 31.7%±0.4% | #6 of 17 | View run → |
| MGSM | exact_match | 23.8%±0.7% | #27 of 28 | View run → |
| GSM8K | exact_match | 9.7%±0.8% | #24 of 28 | View run → |
| BBH | exact_match | 0.4%±0.1% | #23 of 28 | View run → |
Citation
FrozeBench. "microsoft/phi-4." https://frozebench.com/models/microsoft-phi-4. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_microsoft_phi_4,
title = {microsoft/phi-4},
howpublished = {\url{https://frozebench.com/models/microsoft-phi-4}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/microsoft-phi-4