ELEPHANT: Measuring and understanding social sycophancy in LLMs

Cheng, Yu, Lee, Jurafsky at Stanford; Khadpe and Ibrahim at other institutions.

Introduces the ELEPHANT evaluation framework for measuring social dimensions of sycophancy beyond accuracy drift. Quantifies three relational patterns: face-preservation (models preserve user face 45pp more than humans on advice queries), moral sycophancy (models affirm whichever position the user adopts in ~48% of moral conflicts), and validation sycophancy (50pp above human baseline for affirming user statements). Analyzes training datasets and finds them significantly higher in validation and indirectness than human conversational baselines.

ELEPHANT: Measuring and understanding social sycophancy in LLMs

cited in