Skip to main content
FrozeBench

MBPP(Instruct)

mbpp_instruct
CodingInstruction Following

MBPP(Instruct) is the same MBPP problem set re-templated as explicit instruction prompts in the form "Write a Python function that …" so that the input format matches the chat or instruction-tuned model interface. The underlying problems, reference solutions, and unit tests are identical to base MBPP — only the prompt scaffolding and the answer-extraction filter differ, since instruction-tuned models typically wrap code in markdown fences or prose that must be parsed out before testing. The variant exists so that models trained for conversational use are not penalized for prompt-format mismatch on what is otherwise the same evaluation.

Source paperLatest run: 2026-05-25

Benchmark results

Switch between the canonical ranking, release-date performance view, and score-size tradeoff.

24 models

Caveats

Because the problems and tests are unchanged, all base-MBPP caveats carry over: the three-tests-per-problem coverage gap, saturation by frontier models, widespread contamination of the public set, Python- only scope, and uneven crowd-sourced problem-statement quality. Beyond inherited issues, the instruct variant introduces a new dependency on the answer-extraction filter — typically extract_code or equivalent — which must reliably pull a function definition out of arbitrary chat-formatted output. Format mismatches between the model's output style and the extractor's expectations can suppress scores for capable models for reasons that have nothing to do with coding ability. Score gaps between MBPP and MBPP(Instruct) for the same model can therefore reflect prompt-template and parsing differences more than capability differences, and the two variants should not be mixed in a single comparison.

How to cite

Citation

FrozeBench. "MBPP(Instruct)." https://frozebench.com/benchmarks/mbpp-instruct. Retrieved 2026-06-04.

BibTeX

@misc{frozebench_mbpp_instruct,
  title = {MBPP(Instruct)},
  howpublished = {\url{https://frozebench.com/benchmarks/mbpp-instruct}},
  year = {2026},
  note = {FrozeBench. Retrieved 2026-06-04.}
}

URL

https://frozebench.com/benchmarks/mbpp-instruct