microsoft/phi-4-mini-reasoning

Weights: 7.7 GB · safetensors
Source: huggingface page
Commit: master

Evaluated across 13 benchmarks. Ranks in the top 3 on 0 of 13. Strongest showing on MMLU (57.3% acc, #20 of 28). Weakest scores tied at 0.2% on MBPP, and MBPP(Instruct).

Benchmark results

Benchmark	Metric	Score	Rank	Actions
MMLU	acc	57.3%±0.4%	#20 of 28	View run →
MGSM	exact_match	52.8%±0.8%	#22 of 28	View run →
EQ-Bench	eqbench	48.7±3.0	#25 of 28	View run →
IFEval	prompt_level_strict_acc	41.8%±2.1%	#25 of 28	View run →
GPQA Diamond	acc	25.8%±3.1%	#26 of 28	View run →
GPQA Main	acc	25.4%±2.1%	#24 of 28	View run →
GPQA Extended	acc	24.9%±1.9%	#24 of 28	View run →
LongBench	aggregate	14.8%±0.3%	#10 of 17	View run →
MMLU-Pro	exact_match	7.3%±0.2%	#26 of 28	View run →
GSM8K	exact_match	4.3%±0.6%	#26 of 28	View run →
BBH	exact_match	2.9%±0.2%	#19 of 28	View run →
MBPP	pass@1	0.2%±0.2%	#23 of 28	View run →
MBPP(Instruct)	pass@1	0.2%±0.2%	#5 of 24	View run →

MMLU#20 of 28
acc
57.3%±0.4%
View run →
MGSM#22 of 28
exact_match
52.8%±0.8%
View run →
EQ-Bench#25 of 28
eqbench
48.7±3.0
View run →
IFEval#25 of 28
prompt_level_strict_acc
41.8%±2.1%
View run →
GPQA Diamond#26 of 28
acc
25.8%±3.1%
View run →
GPQA Main#24 of 28
acc
25.4%±2.1%
View run →
GPQA Extended#24 of 28
acc
24.9%±1.9%
View run →
LongBench#10 of 17
aggregate
14.8%±0.3%
View run →
MMLU-Pro#26 of 28
exact_match
7.3%±0.2%
View run →
GSM8K#26 of 28
exact_match
4.3%±0.6%
View run →
BBH#19 of 28
exact_match
2.9%±0.2%
View run →
MBPP#23 of 28
pass@1
0.2%±0.2%
View run →
MBPP(Instruct)#5 of 24
pass@1
0.2%±0.2%
View run →

last evaluated:2 weeks ago

How to cite

Citation

FrozeBench. "microsoft/phi-4-mini-reasoning." https://frozebench.com/models/microsoft-phi-4-mini-reasoning. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_microsoft_phi_4_mini_reasoning,
  title = {microsoft/phi-4-mini-reasoning},
  howpublished = {\url{https://frozebench.com/models/microsoft-phi-4-mini-reasoning}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/models/microsoft-phi-4-mini-reasoning