Sample #12 — mmlu_formal_logic
How other models answered
What am I looking at?
Each token in the model's output is colored by how confident the model was when generating it. Deeper blue means lower confidence — the model considered other tokens more seriously. Click any token to see the exact probability and the top alternative tokens the model weighed.
Read our methodology →