Skip to main content
FrozeBench

Leaderboard

Open-weight models scored across every benchmark we've run end-to-end. Click any score to inspect the underlying samples.

Models scored across 13 benchmarks. Use arrow keys to move between cells; press Enter to open a sample run. Empty cells are skipped.
Model
google/gemma-3-12b-it62.3%±0.4%72.7±1.935.9%±3.4%33.3%±2.0%35.7%±2.3%86.3%±0.9%81.0%±1.7%46.2%±0.5%65.4%±2.1%70.8%±2.0%68.4%±0.8%70.7%±0.4%59.7%±0.4%
google/gemma-3-27b-it62.5%±0.4%80.7±1.439.4%±3.5%36.6%±2.1%38.4%±2.3%88.5%±0.9%82.3%±1.6%37.0%±0.5%71.4%±2.0%76.0%±1.9%60.8%±0.8%74.1%±0.4%66.7%±0.4%
google/gemma-4-26B-A4B-it48.1%±0.4%84.2±1.241.4%±3.5%44.3%±2.1%41.7%±2.3%95.6%±0.6%89.5%±1.3%40.2%±0.5%24.4%±1.9%0.0%±0.0%74.5%±0.7%47.9%±0.4%82.3%±0.3%
google/gemma-4-31B-it78.9%±0.3%85.8±1.252.0%±3.6%51.6%±2.1%50.4%±2.4%96.6%±0.5%91.3%±1.2%37.0%±0.4%0.0%±0.0%0.0%±0.0%76.6%±0.7%82.7%±0.3%84.9%±0.3%
google/Gemma-4-31B-IT-NVFP479.7%±0.4%86.2±1.252.5%±3.6%49.3%±2.1%50.7%±2.4%97.0%±0.5%90.8%±1.2%42.5%±0.5%7.2%±1.2%0.0%±0.0%76.6%±0.7%82.1%±0.3%84.3%±0.3%
microsoft/phi-40.4%±0.1%77.6±1.635.9%±3.4%34.2%±2.0%33.5%±2.2%9.7%±0.8%62.7%±2.1%31.7%±0.4%72.2%±2.0%23.8%±0.7%76.3%±0.3%71.8%±0.4%
microsoft/phi-4-mini-instruct40.0%±0.5%67.5±2.533.8%±3.4%31.3%±2.0%33.7%±2.2%37.1%±1.3%69.3%±2.0%40.9%±0.5%54.8%±2.2%0.0%±0.0%51.9%±0.8%66.3%±0.4%51.2%±0.4%
microsoft/phi-4-mini-reasoning2.9%±0.2%48.7±3.025.8%±3.1%24.9%±1.9%25.4%±2.1%4.3%±0.6%41.8%±2.1%14.8%±0.3%0.2%±0.2%0.2%±0.2%52.8%±0.8%57.3%±0.4%7.3%±0.2%
microsoft/phi-4-reasoning-plus3.3%±0.2%3.6±1.427.8%±3.2%26.9%±1.9%26.1%±2.1%9.2%±0.8%23.3%±1.8%10.2%±0.2%0.4%±0.3%0.0%±0.0%23.5%±0.6%77.6%±0.3%29.4%±0.4%
MiniMax/MiniMax-M2-AWQ0.1%±0.0%0.5±0.525.8%±3.1%24.9%±1.9%21.4%±1.9%28.8%±1.2%40.3%±2.1%19.0%±0.3%1.2%±0.5%63.3%±0.8%81.6%±0.3%59.3%±0.4%
MiniMax/MiniMax-M2.1-AWQ0.1%±0.0%52.5±3.128.8%±3.2%23.4%±1.8%24.3%±2.0%70.4%±1.3%35.3%±2.1%60.4%±2.2%0.0%±0.0%48.2%±0.8%25.5%±0.4%75.6%±0.4%
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP418.1%±0.4%78.3±1.829.8%±3.3%26.6%±1.9%29.5%±2.2%86.1%±1.0%88.4%±1.4%35.2%±0.4%94.0%±1.1%0.0%±0.0%79.4%±0.7%36.2%±0.4%80.5%±0.3%
openai/gpt-oss-120b0.0%±0.0%78.6±1.628.8%±3.2%28.8%±1.9%26.1%±2.1%74.5%±1.2%78.2%±1.8%14.3%±0.3%9.8%±1.3%2.4%±0.7%62.2%±0.8%24.7%±0.4%62.2%±0.4%
openai/gpt-oss-20b1.5%±0.1%73.9±1.926.8%±3.2%27.7%±1.9%26.8%±2.1%39.3%±1.3%60.4%±2.1%13.5%±0.3%0.0%±0.0%52.8%±2.2%58.3%±0.8%56.4%±0.4%32.6%±0.4%
Qwen/Qwen3-14B41.8%±0.5%79.3±1.829.3%±3.2%26.2%±1.9%26.1%±2.1%87.6%±0.9%80.2%±1.7%8.1%±0.1%71.4%±2.0%0.0%±0.0%66.2%±0.8%77.2%±0.3%65.3%±0.4%
Qwen/Qwen3-235B-A22B-Thinking-AWQ-250727.8%±0.4%76.3±2.225.3%±3.1%27.3%±1.9%29.9%±2.2%80.7%±1.1%77.6%±1.8%17.7%±0.3%65.8%±2.1%0.0%±0.0%49.3%±0.8%84.7%±0.3%44.7%±0.4%
Qwen/Qwen3-32B47.0%±0.4%80.3±1.929.3%±3.2%28.6%±1.9%28.8%±2.1%94.0%±0.7%79.5%±1.7%7.6%±0.1%47.4%±2.2%61.1%±0.8%80.8%±0.3%70.0%±0.4%
Qwen/Qwen3-4B0.0%±0.0%74.1±1.926.8%±3.2%23.1%±1.8%25.2%±2.1%71.8%±1.2%79.3%±1.7%14.6%±0.2%0.2%±0.2%0.0%±0.0%56.0%±0.8%68.4%±0.4%54.3%±0.4%
Qwen/Qwen3-4B-AWQ72.7%±0.5%65.0±2.732.3%±3.3%31.7%±2.0%33.0%±2.2%62.4%±1.3%76.5%±1.8%31.9%±0.4%58.0%±2.2%0.0%±0.0%55.5%±0.8%67.2%±0.4%55.7%±0.4%
Qwen/Qwen3-8B13.2%±0.3%75.8±1.927.8%±3.2%23.8%±1.8%25.4%±2.1%84.9%±1.0%81.7%±1.7%32.1%±0.4%65.6%±2.1%0.0%±0.0%47.9%±0.8%73.0%±0.4%57.7%±0.4%
Qwen/Qwen3-Coder-30B-A3B-Instruct-FP875.8%±0.4%75.6±1.830.8%±3.3%28.9%±1.9%28.3%±2.1%94.2%±0.6%82.1%±1.7%32.4%±0.4%78.6%±1.8%0.0%±0.0%67.4%±0.8%68.9%±0.4%70.3%±0.4%
Qwen/Qwen3-Next-80B-A3B-Instruct2.3%±0.2%76.8±1.738.9%±3.5%40.8%±2.1%42.4%±2.3%73.1%±1.2%87.1%±1.4%49.2%±0.5%79.6%±1.8%0.0%±0.0%83.2%±0.7%64.0%±0.4%80.8%±0.3%
Qwen/Qwen3.5-122B-A10B-NVFP449.7%±0.5%80.8±1.828.3%±3.2%25.8%±1.9%26.1%±2.1%81.5%±1.1%85.4%±1.5%35.5%±0.4%48.6%±2.2%0.0%±0.0%74.9%±0.7%27.1%±0.4%60.4%±0.4%
Qwen/Qwen3.5-35B-A3B49.6%±0.5%79.0±1.929.8%±3.3%27.7%±1.9%28.3%±2.1%79.5%±1.1%77.6%±1.8%31.9%±0.4%55.2%±2.2%0.0%±0.0%74.4%±0.7%37.9%±0.4%63.6%±0.4%
Qwen/Qwen3.6-27B31.2%±0.5%82.0±1.628.8%±3.2%25.5%±1.9%28.8%±2.1%69.1%±1.3%88.9%±1.4%36.6%±0.4%26.8%±2.0%0.0%±0.0%69.4%±0.7%84.5%±0.3%26.2%±0.3%
Qwen/Qwen3.6-35B-A3B24.1%±0.4%84.6±1.326.3%±3.1%27.8%±1.9%29.9%±2.2%1.1%±0.3%89.6%±1.3%31.5%±0.4%3.4%±0.8%0.0%±0.0%67.1%±0.8%37.9%±0.4%7.2%±0.2%
zai-org/GLM-4.5-Air-FP80.0%±0.0%77.9±1.924.2%±3.1%26.7%±1.9%25.7%±2.1%80.0%±1.1%76.7%±1.8%8.3%±0.1%0.0%±0.0%0.0%±0.0%60.8%±0.8%57.5%±0.4%56.8%±0.4%
zai-org/GLM-4.5V-FP82.7%±0.2%1.1±0.828.3%±3.2%23.8%±1.8%24.3%±2.0%3.5%±0.5%69.9%±2.0%27.3%±0.4%0.0%±0.0%73.0%±0.7%81.0%±0.3%6.0%±0.2%