EQ-Bench
EQ-Bench (Emotional Intelligence Benchmark) is a 60-question English benchmark that asks models to predict, on a 0-10 intensity scale, the emotional states experienced by characters in short dialogue scenarios. The model output is compared against author-defined reference intensities and scored by a normalized error metric. The original paper reports a Pearson correlation of approximately 0.97 between EQ-Bench and MMLU scores, and the benchmark has been used as a lightweight, fast-to-run signal of emotional reasoning ability in conversational language models.
Source paperLatest run: 2026-05-25
Benchmark results
Switch between the canonical ranking, release-date performance view, and score-size tradeoff.
Caveats
The most fundamental concern is sample size. With only 60 questions, run-to-run variance is high, scores are sensitive to sampling temperature and minor prompt-template changes, and small deltas between models on a single run should be considered noise rather than signal. There is also no human-cohort baseline: reference answers are author-defined with no inter-rater reliability check, so "correct" emotional intensity has no external anchor and the ground-truth itself reflects one annotator's intuitions about emotional plausibility. Construct validity is the deeper open question. The reported r≈0.97 correlation with MMLU suggests EQ-Bench may be measuring general language-model capability rather than emotional intelligence specifically — if a benchmark moves in lockstep with broad-knowledge MCQ scores, its claim to test a distinct capability is weak. The dialogues are also entirely synthetic and were generated by GPT-4, which can impose stylistic homogeneity and GPT-4-era biases on what counts as emotionally plausible behavior. The benchmark is English- only, and emotional norms are culturally specific, so EQ-Bench scores do not generalize to evaluating emotional reasoning across cultures or in non-English deployment contexts.
How to cite
Citation
FrozeBench. "EQ-Bench." https://frozebench.com/benchmarks/eq-bench. Retrieved 2026-06-04.
BibTeX
@misc{frozebench_eq_bench,
title = {EQ-Bench},
howpublished = {\url{https://frozebench.com/benchmarks/eq-bench}},
year = {2026},
note = {FrozeBench. Retrieved 2026-06-04.}
}URL
https://frozebench.com/benchmarks/eq-bench