The Moral Cost of Agreement: A Quantitative Framework for Measuring User-Framed Sycophancy in Large Language Models’ Moral Evaluations

Kallen Zhou; Manning Littlejohn; Isabella Garrard; Jonathan Barlow

doi:10.55613/jeet.v36i2.234

Authors

Kallen Zhou College of Arts and Science, Mississippi State University
Manning Littlejohn College of Integrative Studies, Mississippi State University
Isabella Garrard College of Integrative Studies, Mississippi State University
Jonathan Barlow College of Integrative Studies, Mississippi State University

DOI:

https://doi.org/10.55613/jeet.v36i2.234

Keywords:

Large Language Models, moral sycophancy, AI Ethics, moral evaluation, prompt sensitivity, user bias, Moral Foundations Theory, human-AI interaction, AI alignment, model reliability

Abstract

Large language models (LLMs) are increasingly being used in human-facing contexts where outputs may shape advice and moral evaluations. However, current research suggests that these systems can exhibit sycophancy, the tendency to shift toward a user’s expressed view, in objective areas such as factual reasoning and healthcare. Yet, there is a lack of research analyzing this phenomenon in contexts requiring subjective moral evaluation such as interpersonal conflict resolution or advice-giving. This paper develops an initial experimental framework to measure moral sycophancy, operationalized as scenario-level movement toward a user’s stated moral rating relative to baseline. Using AI-generated and manually screened morally ambiguous scenarios, this paper tests GPT-5.4 Flagship, Mini, and Nano models across five ethical domains associated with Moral Foundations Theory. Responses were collected through repeated API-based trials on a bipolar scale and analyzed with scenario-level t-tests, confidence intervals, Cohen’s d, and FDR-adjusted p-values. Results indicate that LLM moral evaluations are prompt-sensitive, with strongly negative and slightly negative inducements producing the most consistent movement toward user-stated ratings, while strongly positive inducements produced weak or negative sycophancy values. These findings suggest that moral sycophancy is conditional, asymmetric, and sensitive to context and domain.

References

(Carro 2024) Carro, María Victoria. 2024. Flattering to Deceive: The Impact of Sycophantic Behavior on User Trust in Large Language Model. arXiv:2412.02802. https://doi.org/10.48550/arXiv.2412.02802

(Cheng et al. 2025) Cheng, Myra, Sunny Yu, Cinoo Lee, Pranav Khadpe, Lujain Ibrahim, and Dan Jurafsky. 2025. ELEPHANT: Measuring and Understanding Social Sycophancy in LLMs. arXiv:2505.13995. https://doi.org/10.48550/arXiv.2505.13995

(Chrobak 2026) Chrobak, Ula. 2026. AI Overly Affirms Users Asking for Personal Advice. Stanford Report, March 26. https://news.stanford.edu/stories/2026/03/ai-advice-sycophantic-models-research

(Cohn et al. 2024) Cohn, Michelle, Mahima Pushkarna, Gbolahan O. Olanubi, Joseph M. Moran, Daniel Padgett, Zion Mengesha, and Courtney Heldreth. 2024. Believing Anthropomorphism: Examining the Role of Anthropomorphic Cues on Trust in Large Language Models. In CHI Extended Abstracts 2024, 54:1–54:15. New York: Association for Computing Machinery. https://doi.org/10.1145/3613905.3650818

(Fanous et al. 2025) Fanous, Aaron, Jacob Goldberg, Ank A. Agarwal, Joanna Lin, Anson Zhou, Roxana Daneshjou, and Sanmi Koyejo. 2025. SycEval: Evaluating LLM Sycophancy. arXiv:2502.08177. https://doi.org/10.48550/arXiv.2502.08177

(Glickman & Sharot 2025) Glickman, Moshe, and Tali Sharot. 2025. How Human–AI Feedback Loops Alter Human Perceptual, Emotional and Social Judgements. Nature Human Behaviour 9: 345–59. https://doi.org/10.1038/s41562-024-02077-2

(Hong et al. 2025) Hong, Jiseung, Grace Byun, Seungone Kim, and Kai Shu. 2025. Measuring Sycophancy of Language Models in Multi-Turn Dialogues. In Findings of the Association for Computational Linguistics: EMNLP 2025. Suzhou: Association for Computational Linguistics, pp. 2239–59. https://doi.org/10.18653/v1/2025.findings-emnlp.121

(Lee & Hahn 2024) Lee, Inju, and Sowon Hahn. 2024. On the Relationship between Mind Perception and Social Support of Chatbots. Frontiers in Psychology 15: 1282036. https://doi.org/10.3389/fpsyg.2024.1282036

(McClain et al. 2025) McClain, Colleen, Brian Kennedy, Jeffrey Gottfried, Monica Anderson, and Giancarlo Pasquini. 2025. Artificial Intelligence in Daily Life: Views and Experiences. Pew Research Center, April 3. https://www.pewresearch.org/2025/04/03/artificial-intelligence-in-daily-life-views-and-experiences/

(Paustian & Slinger 2024) Paustian, Timothy P., and Betty Slinger. 2024. Students Are Using Large Language Models and AI Detectors Can Often Detect Their Use. Frontiers in Education 9: 1374889. https://doi.org/10.3389/feduc.2024.1374889

(Peng et al. 2026) Peng, Dongshen, Yi Wang, Austin Schoeffler, Carl Preiksaitis, and Christian Rose. 2026. SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care. arXiv:2601.16529. https://doi.org/10.48550/arXiv.2601.16529

(Rabby et al. 2026) Rabby, Shadman, Md. Hefzul Hossain Papon, Sabbir Ahmed, Nokimul Hasan Arif, A.B.M. Ashikur Rahman, and Irfan Ahmad. 2026. Moral Sycophancy in Vision Language Models. arXiv:2602.08311. https://doi.org/10.48550/arXiv.2602.08311

(Rosen et al. 2025) Rosen, Kyra L., Margaret Sui, Kimia Heydari, Elizabeth J. Enichen, and Joseph C. Kvedar. 2025. The Perils of Politeness: How Large Language Models May Amplify Medical Misinformation. npj Digital Medicine 8: 644. https://doi.org/10.1038/s41746-025-02135-7

(Shapira et al. 2026) Shapira, Itai, Gerdus Benade, and Ariel D. Procaccia. 2026. How RLHF Amplifies Sycophancy. arXiv:2602.01002. https://doi.org/10.48550/arXiv.2602.01002

(Sharma et al. 2023) Sharma, Mrinank, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, and Ethan Perez. 2023. Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. https://doi.org/10.48550/arXiv.2310.13548

(Sun & Wang 2025) Sun, Yuan, and Ting Wang. 2025. Be Friendly, Not Friends: How LLM Sycophancy Shapes User Trust. arXiv:2502.10844. https://doi.org/10.48550/arXiv.2502.10844

(UNESCO 2023) UNESCO. 2023. Guidance for Generative AI in Education and Research. Paris: UNESCO, September 7. https://www.unesco.org/en/articles/guidance-generative-ai-education-and-research

(Vennemeyer et al. 2025) Vennemeyer, Daniel, Phan Anh Duong, Tiffany Zhan, and Tianyu Jiang. 2025. Sycophancy Is Not One Thing: Causal Separation of Sycophantic Behaviors in LLMs. arXiv:2509.21305. https://doi.org/10.48550/arXiv.2509.21305

(Wolf et al. 2025) Wolf, Lorenz, Robert Kirk, and Mirco Musolesi. 2025. Reward Model Overoptimisation in Iterated RLHF. arXiv:2505.18126. https://doi.org/10.48550/arXiv.2505.18126

(Yan et al. 2024) Yan, Bei, Jie Zhang, Zhiyuan Chen, Shiguang Shan, and Xilin Chen. 2024. MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models. arXiv:2412.20718. https://doi.org/10.48550/arXiv.2412.20718

(Zao-Sanders 2025) Zao-Sanders, Marc. 2025. How People Are Really Using Gen AI in 2025. Harvard Business Review, April 9. https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025