About
About FrozeBench
FrozeBench is an independent, evidence-first project for reproducible open-weight LLM evaluation.
Mission
Developed in Canada, FrozeBench exists to make benchmark results inspectable and reproducible. Every score should be a path back to the run, configuration, and sample artifacts that produced it.
Independence
FrozeBench publishes evaluation runs we produce ourselves instead of copying numbers from model cards, vendor pages, or other leaderboards. The project is built to separate evidence from marketing claims.
Reproducibility
Runs are recorded with explicit model, task, and generation settings, including frozen defaults such as temperature-zero generation where applicable. The methodology page explains the run pipeline and what each run captures.
Scope
FrozeBench focuses on open-weight language models and public benchmark results. The site is read-only so researchers, engineers, and technical writers can browse, compare, cite, and inspect evidence behind published scores.
Contact
For questions, corrections, or project contact, reach FrozeBench through GitHub.