GSM8K#13 of 28
exact_match
View run →80.0%±1.1%
Evaluated across 13 benchmarks. Ranks in the top 3 on 0 of 13. Strongest showing on GSM8K (80.0% exact_match, #13 of 28). Weakest scores tied at 0.0% on BBH, MBPP, and MBPP(Instruct).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| GSM8K | exact_match | 80.0%±1.1% | #13 of 28 | View run → |
| EQ-Bench | eqbench | 77.9±1.9 | #13 of 28 | View run → |
| IFEval | prompt_level_strict_acc | 76.7%±1.8% | #19 of 28 | View run → |
| MGSM | exact_match | 60.8%±0.8% | #17 of 28 | View run → |
| MMLU | acc | 57.5%±0.4% | #19 of 28 | View run → |
| MMLU-Pro | exact_match | 56.8%±0.4% | #18 of 28 | View run → |
| GPQA Extended | acc | 26.7%±1.9% | #18 of 28 | View run → |
| GPQA Main | acc | 25.7%±2.1% | #22 of 28 | View run → |
| GPQA Diamond | acc | 24.2%±3.1% | #28 of 28 | View run → |
| LongBench | aggregate | 8.3%±0.1% | #15 of 17 | View run → |
| BBH | exact_match | 0.0%±0.0% | #27 of 28 | View run → |
| MBPP | pass@1 | 0.0%±0.0% | #26 of 28 | View run → |
| MBPP(Instruct) | pass@1 | 0.0%±0.0% | #12 of 24 | View run → |
Citation
FrozeBench. "zai-org/GLM-4.5-Air-FP8." https://frozebench.com/models/zai-org-glm-4-5-air-fp8. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_zai_org_GLM_4_5_Air_FP8,
title = {zai-org/GLM-4.5-Air-FP8},
howpublished = {\url{https://frozebench.com/models/zai-org-glm-4-5-air-fp8}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/zai-org-glm-4-5-air-fp8