Qwen/Qwen3-14B

Weights: 29.6 GB · safetensors
Source: huggingface page
Commit: 40c0698

Evaluated across 13 benchmarks. Ranks in the top 3 on 0 of 13. Strongest showing on GSM8K (87.6% exact_match, #7 of 28). Weakest on MBPP(Instruct) (0.0% pass@1, #23 of 24).

Benchmark results

Benchmark	Metric	Score	Rank	Actions
GSM8K	exact_match	87.6%±0.9%	#7 of 28	View run →
IFEval	prompt_level_strict_acc	80.2%±1.7%	#13 of 28	View run →
EQ-Bench	eqbench	79.3±1.8	#9 of 28	View run →
MMLU	acc	77.2%±0.3%	#9 of 28	View run →
MBPP	pass@1	71.4%±2.0%	#5 of 28	View run →
MGSM	exact_match	66.2%±0.8%	#13 of 28	View run →
MMLU-Pro	exact_match	65.3%±0.4%	#11 of 28	View run →
BBH	exact_match	41.8%±0.5%	#11 of 28	View run →
GPQA Diamond	acc	29.3%±3.2%	#14 of 28	View run →
GPQA Extended	acc	26.2%±1.9%	#20 of 28	View run →
GPQA Main	acc	26.1%±2.1%	#19 of 28	View run →
LongBench	aggregate	8.1%±0.1%	#16 of 17	View run →
MBPP(Instruct)	pass@1	0.0%±0.0%	#23 of 24	View run →

GSM8K#7 of 28
exact_match
87.6%±0.9%
View run →
IFEval#13 of 28
prompt_level_strict_acc
80.2%±1.7%
View run →
EQ-Bench#9 of 28
eqbench
79.3±1.8
View run →
MMLU#9 of 28
acc
77.2%±0.3%
View run →
MBPP#5 of 28
pass@1
71.4%±2.0%
View run →
MGSM#13 of 28
exact_match
66.2%±0.8%
View run →
MMLU-Pro#11 of 28
exact_match
65.3%±0.4%
View run →
BBH#11 of 28
exact_match
41.8%±0.5%
View run →
GPQA Diamond#14 of 28
acc
29.3%±3.2%
View run →
GPQA Extended#20 of 28
acc
26.2%±1.9%
View run →
GPQA Main#19 of 28
acc
26.1%±2.1%
View run →
LongBench#16 of 17
aggregate
8.1%±0.1%
View run →
MBPP(Instruct)#23 of 24
pass@1
0.0%±0.0%
View run →

last evaluated:1 months ago

How to cite

Citation

FrozeBench. "Qwen/Qwen3-14B." https://frozebench.com/models/qwen-qwen3-14b. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_Qwen_Qwen3_14B,
  title = {Qwen/Qwen3-14B},
  howpublished = {\url{https://frozebench.com/models/qwen-qwen3-14b}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/models/qwen-qwen3-14b