Qwen/Qwen3-4B

Weights: 8.1 GB · safetensors
Source: huggingface page
Commit: master

Evaluated across 13 benchmarks. Ranks in the top 3 on 0 of 13. Strongest showing on IFEval (79.3% prompt_level_strict_acc, #15 of 28). Weakest scores tied at 0.0% on BBH, and MBPP(Instruct).

Benchmark results

Benchmark	Metric	Score	Rank	Actions
IFEval	prompt_level_strict_acc	79.3%±1.7%	#15 of 28	View run →
EQ-Bench	eqbench	74.1±1.9	#19 of 28	View run →
GSM8K	exact_match	71.8%±1.2%	#17 of 28	View run →
MMLU	acc	68.4%±0.4%	#15 of 28	View run →
MGSM	exact_match	56.0%±0.8%	#20 of 28	View run →
MMLU-Pro	exact_match	54.3%±0.4%	#20 of 28	View run →
GPQA Diamond	acc	26.8%±3.2%	#22 of 28	View run →
GPQA Main	acc	25.2%±2.1%	#25 of 28	View run →
GPQA Extended	acc	23.1%±1.8%	#28 of 28	View run →
LongBench	aggregate	14.6%±0.2%	#11 of 17	View run →
MBPP	pass@1	0.2%±0.2%	#24 of 28	View run →
BBH	exact_match	0.0%±0.0%	#28 of 28	View run →
MBPP(Instruct)	pass@1	0.0%±0.0%	#6 of 24	View run →

IFEval#15 of 28
prompt_level_strict_acc
79.3%±1.7%
View run →
EQ-Bench#19 of 28
eqbench
74.1±1.9
View run →
GSM8K#17 of 28
exact_match
71.8%±1.2%
View run →
MMLU#15 of 28
acc
68.4%±0.4%
View run →
MGSM#20 of 28
exact_match
56.0%±0.8%
View run →
MMLU-Pro#20 of 28
exact_match
54.3%±0.4%
View run →
GPQA Diamond#22 of 28
acc
26.8%±3.2%
View run →
GPQA Main#25 of 28
acc
25.2%±2.1%
View run →
GPQA Extended#28 of 28
acc
23.1%±1.8%
View run →
LongBench#11 of 17
aggregate
14.6%±0.2%
View run →
MBPP#24 of 28
pass@1
0.2%±0.2%
View run →
BBH#28 of 28
exact_match
0.0%±0.0%
View run →
MBPP(Instruct)#6 of 24
pass@1
0.0%±0.0%
View run →

last evaluated:2 weeks ago

How to cite

Citation

FrozeBench. "Qwen/Qwen3-4B." https://frozebench.com/models/qwen-qwen3-4b. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_Qwen_Qwen3_4B,
  title = {Qwen/Qwen3-4B},
  howpublished = {\url{https://frozebench.com/models/qwen-qwen3-4b}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/models/qwen-qwen3-4b