GSM8K#1 of 28
exact_match
View run →97.0%±0.5%
Evaluated across 13 benchmarks. Ranks in the top 3 on 9 of 13. Strongest showing on GSM8K (97.0% exact_match, #1 of 28). Weakest on MBPP(Instruct) (0.0% pass@1, #13 of 24).
| Benchmark | Metric | Score | Rank | Actions |
|---|---|---|---|---|
| GSM8K | exact_match | 97.0%±0.5% | #1 of 28 | View run → |
| IFEval | prompt_level_strict_acc | 90.8%±1.2% | #2 of 28 | View run → |
| EQ-Bench | eqbench | 86.2±1.2 | #1 of 28 | View run → |
| MMLU-Pro | exact_match | 84.3%±0.3% | #2 of 28 | View run → |
| MMLU | acc | 82.1%±0.3% | #4 of 28 | View run → |
| BBH | exact_match | 79.7%±0.4% | #1 of 28 | View run → |
| MGSM | exact_match | 76.6%±0.7% | #4 of 28 | View run → |
| GPQA Diamond | acc | 52.5%±3.6% | #1 of 28 | View run → |
| GPQA Main | acc | 50.7%±2.4% | #1 of 28 | View run → |
| GPQA Extended | acc | 49.3%±2.1% | #2 of 28 | View run → |
| LongBench | score | 42.5%±0.5% | #2 of 10 | View run → |
| MBPP | pass@1 | 7.2%±1.2% | #19 of 28 | View run → |
| MBPP(Instruct) | pass@1 | 0.0%±0.0% | #13 of 24 | View run → |
Citation
FrozeBench. "google/Gemma-4-31B-IT-NVFP4." https://frozebench.com/models/google-gemma-4-31b-it-nvfp4. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_google_Gemma_4_31B_IT_NVFP4,
title = {google/Gemma-4-31B-IT-NVFP4},
howpublished = {\url{https://frozebench.com/models/google-gemma-4-31b-it-nvfp4}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/models/google-gemma-4-31b-it-nvfp4