Skip to main content
FrozeBench

MBPP

mbpp
Coding

MBPP (Mostly Basic Python Problems) is a dataset of 974 crowd-sourced entry-level Python programming tasks, each pairing a short natural-language description with a reference solution and three unit tests that automatically score model output. Problems target programmers with up to a year of Python experience and cover basic data structures, string manipulation, and simple algorithms. MBPP and its sanitized 427-problem subset became one of the standard probes for code-generation capability in pre-trained language models, and remain widely reported despite the field having moved toward more discriminative coding benchmarks.

Source paperLatest run: 2026-05-25

Benchmark results

Switch between the canonical ranking, release-date performance view, and score-size tradeoff.

28 models

Caveats

The most consequential limitation is unit-test coverage: with only three tests per problem, both false positives (incorrect solutions passing all three tests) and false negatives (correct solutions failing edge cases that were never tested) are common. The EvalPlus family of benchmarks was created specifically to address this by augmenting MBPP with auto-generated stress tests, and observed score drops on MBPP+ relative to vanilla MBPP confirm that headline pass@1 numbers overstate true correctness rates by several points. MBPP is also saturated for frontier models, with pass@1 scores routinely above 80-90%, leaving little discriminative headroom. The dataset has been public since 2021 and is widely contaminated in modern training corpora, so high scores partially reflect memorization. Coverage is restricted to Python, so MBPP says nothing about multi-language coding ability or about more realistic software- engineering scenarios involving multiple files, dependencies, or iterative debugging. Crowd-sourced problem statements are also of uneven quality, with some descriptions ambiguous enough that "correct" behavior is open to interpretation.

How to cite

Citation

FrozeBench. "MBPP." https://frozebench.com/benchmarks/mbpp. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_mbpp,
  title = {MBPP},
  howpublished = {\url{https://frozebench.com/benchmarks/mbpp}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/benchmarks/mbpp