Sample #12 — gsm8k
How other models answered
What am I looking at?
Each token in the model's output is colored by how confident the model was when generating it. Deeper blue means lower confidence — the model considered other tokens more seriously. Click any token to see the exact probability and the top alternative tokens the model weighed.
Read our methodology →