Skip to main content
FrozeBench

MGSM

mgsm_direct
MathReasoningMultilingual

MGSM (Multilingual Grade School Math) is a 250-problem subset of GSM8K hand-translated into ten typologically diverse languages — Spanish, French, German, Russian, Chinese, Japanese, Thai, Swahili, Bengali, and Telugu — yielding 2,500 problem-language pairs. It was built to probe whether chain-of-thought reasoning transfers across languages, including low-resource ones, and the original paper showed that CoT in English on translated problems often outperformed CoT in the source language. The "direct" variant evaluated by this slug uses no chain-of-thought scaffolding and elicits the answer directly, providing a substantially harder lower-bound on multilingual arithmetic capability.

Source paperLatest run: 2026-05-26

Benchmark results

Switch between the canonical ranking, release-date performance view, and score-size tradeoff.

28 models

Caveats

Per-language sample size is only 250, so per-language variance is high and small deltas between models on a single language should not be over-interpreted. Translation quality also varies — particularly for low-resource languages such as Swahili, Bengali, and Telugu, where translators may have limited domain vocabulary — so observed gaps reflect translation quality as much as model capability and cannot cleanly be attributed to one factor. Inheritance from GSM8K means MGSM carries forward GSM8K's saturation and contamination concerns: top models exceed 90% on the high-resource splits (Spanish, French, Chinese), and the underlying problem set is publicly available. The mgsm_direct (no-CoT) variant in particular is harder and more sensitive to extraction filtering, so its scores are not comparable to mgsm CoT numbers and the two should never be mixed in a single comparison. Some published evaluations also elicit English chain-of-thought on non-English inputs, conflating language-switching ability with reasoning and muddying interpretation of multilingual capability.

How to cite

Citation

FrozeBench. "MGSM." https://frozebench.com/benchmarks/mgsm-direct. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_mgsm_direct,
  title = {MGSM},
  howpublished = {\url{https://frozebench.com/benchmarks/mgsm-direct}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/benchmarks/mgsm-direct