Qwen/Qwen3-8B

Weights: 16.4 GB · safetensors
Source: huggingface page
Commit: b968826

Evaluated across 13 benchmarks. Ranks in the top 3 on 0 of 13. Strongest showing on GSM8K (84.9% exact_match, #10 of 28). Weakest on MBPP(Instruct) (0.0% pass@1, #8 of 24).

Benchmark results

Benchmark	Metric	Score	Rank	Actions
GSM8K	exact_match	84.9%±1.0%	#10 of 28	View run →
IFEval	prompt_level_strict_acc	81.7%±1.7%	#11 of 28	View run →
EQ-Bench	eqbench	75.8±1.9	#17 of 28	View run →
MMLU	acc	73.0%±0.4%	#12 of 28	View run →
MBPP	pass@1	65.6%±2.1%	#8 of 28	View run →
MMLU-Pro	exact_match	57.7%±0.4%	#17 of 28	View run →
MGSM	exact_match	47.9%±0.8%	#26 of 28	View run →
LongBench	aggregate	32.1%±0.4%	#4 of 17	View run →
GPQA Diamond	acc	27.8%±3.2%	#20 of 28	View run →
GPQA Main	acc	25.4%±2.1%	#23 of 28	View run →
GPQA Extended	acc	23.8%±1.8%	#26 of 28	View run →
BBH	exact_match	13.2%±0.3%	#17 of 28	View run →
MBPP(Instruct)	pass@1	0.0%±0.0%	#8 of 24	View run →

GSM8K#10 of 28
exact_match
84.9%±1.0%
View run →
IFEval#11 of 28
prompt_level_strict_acc
81.7%±1.7%
View run →
EQ-Bench#17 of 28
eqbench
75.8±1.9
View run →
MMLU#12 of 28
acc
73.0%±0.4%
View run →
MBPP#8 of 28
pass@1
65.6%±2.1%
View run →
MMLU-Pro#17 of 28
exact_match
57.7%±0.4%
View run →
MGSM#26 of 28
exact_match
47.9%±0.8%
View run →
LongBench#4 of 17
aggregate
32.1%±0.4%
View run →
GPQA Diamond#20 of 28
acc
27.8%±3.2%
View run →
GPQA Main#23 of 28
acc
25.4%±2.1%
View run →
GPQA Extended#26 of 28
acc
23.8%±1.8%
View run →
BBH#17 of 28
exact_match
13.2%±0.3%
View run →
MBPP(Instruct)#8 of 24
pass@1
0.0%±0.0%
View run →

last evaluated:2 weeks ago

How to cite

Citation

FrozeBench. "Qwen/Qwen3-8B." https://frozebench.com/models/qwen-qwen3-8b. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_Qwen_Qwen3_8B,
  title = {Qwen/Qwen3-8B},
  howpublished = {\url{https://frozebench.com/models/qwen-qwen3-8b}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/models/qwen-qwen3-8b