Skip to main content
FrozeBench

About

About FrozeBench

FrozeBench is an independent, evidence-first project for reproducible open-weight LLM evaluation.

Mission

Developed in Canada, FrozeBench exists to make benchmark results inspectable and reproducible. Every score should be a path back to the run, configuration, and sample artifacts that produced it.

Independence

FrozeBench publishes evaluation runs we produce ourselves instead of copying numbers from model cards, vendor pages, or other leaderboards. The project is built to separate evidence from marketing claims.

Reproducibility

Runs are recorded with explicit model, task, and generation settings, including frozen defaults such as temperature-zero generation where applicable. The methodology page explains the run pipeline and what each run captures.

Scope

FrozeBench focuses on open-weight language models and public benchmark results. The site is read-only so researchers, engineers, and technical writers can browse, compare, cite, and inspect evidence behind published scores.

Contact

For questions, corrections, or project contact, reach FrozeBench through GitHub.