Industry Self-Flagging and the Insufficiency Critique of Alignment

Steven Umbrello

doi:10.55613/jeet.v36i2.251

Authors

Steven Umbrello Università degli Studi di Torino https://orcid.org/0000-0003-2594-6313

DOI:

https://doi.org/10.55613/jeet.v36i2.251

Keywords:

AI alignment, Magnifica Humanitas, Catholic social teaching, AI safety, frontier AI, value sensitive design

Abstract

Pope Leo XIV’s encyclical Magnifica Humanitas (2026) advances the claim that aligning AI systems to a privately determined set of values is structurally insufficient, regardless of how well the alignment is executed, because the values themselves are decided outside the public deliberative process, what I call the insufficiency critique of alignment. This editorial argues that the insufficiency critique, often heard as theological externalism, has been independently and substantively articulated in a corpus of papers published by frontier AI labs and their affiliated research bodies during 2025-2026. I catalogue five such papers from Apple, Microsoft AI, and Anthropic, identify the methodological pattern they share, and read each as a structural finding about the limits of alignment-as-currently-practiced. The convergence between magisterial framing and industry self-flagging is striking and citable. Three implications follow. First, the standard dismissal of insufficiency arguments as outside-the-tent commentary on a technical practice is harder to sustain when the labs are publishing the same diagnosis. Second, alignment work remains necessary, but the framework needs to evolve to absorb the insufficiency critique. Third, several near-term moves, including value-sensitive design and public deliberative infrastructure, follow directly from taking the convergence seriously.

Author Biography

Steven Umbrello, Università degli Studi di Torino

Steven Umbrello is Managing Director at the Institute for Ethics and Emerging Technologies and a research fellow at the University of Turin. He is also an associate researcher at the Collège des Bernardins, where he works on digital humanism, and was previously a research fellow at the Delft University of Technology.

He is the editor of several international academic journals, including the International Journal of Technoethics, the Journal of Responsible Technology, and the Journal of Ethics and Emerging Technologies.

His research focuses on Value Sensitive Design (VSD) and the theology of Bernard Lonergan, exploring their potential application to emerging technologies such as artificial intelligence. He is interested in how philosophical and theological frameworks can inform the design and governance of technology in ways that serve human dignity and the common good.

References

1. Bariach, B., Schoenegger, P., Bhaskar, M., & Suleyman, M. (2026). Seemingly Conscious AI Risks. SSRN Working Paper 6588659. Microsoft AI. https://ssrn.com/abstract=6588659

2. Butlin, P., Long, R., Elmoznino, E., Bengio, Y., Birch, J., Constant, A., Deane, G., Fleming, S. M., Frith, C., Ji, X., Kanai, R., Klein, C., Lindsay, G., Michel, M., Mudrik, L., Peters, M. A. K., Schwitzgebel, E., Simon, J., & VanRullen, R. (2023). Consciousness in Arti-ficial Intelligence: Insights from the Science of Consciousness. arXiv:2308.08708v3. https://doi.org/10.48550/arXiv.2308.08708

3. Cloud, A., Le, M., Chua, J., Betley, J., Sztyber-Betley, A., Hilton, J., Marks, S., & Evans, O. (2025). Subliminal Learning: Language Models Transmit Behavioral Traits via Hidden Signals in Data. arXiv:2507.14805v1. https://doi.org/10.48550/arXiv.2507.14805

4. Favaro, M., & Clark, J. (2026, June 4). When AI Builds Itself. The Anthropic Institute. https://www.anthropic.com/institute/recursive-self-improvement

5. Friedman, B., & Hendry, D. G. (2019). Value Sensitive Design: Shaping Technology with Moral Imagination. MIT Press.

6. Hicks, M. T., Humphries, J., & Slater, J. (2024). ChatGPT is bullshit. Ethics and Information Technology, 26(2): 38. https://doi.org/10.1007/s10676-024-09775-5

7. Humphries, J., Hicks, M. T., & Slater, J. (2026). LLMs bullshit by design: A reply to Licon. Philosophy & Technology, 39(2): 98. https://doi.org/10.1007/s13347-025-01016-x

8. Kazemi, H., Chegini, A., & Safi, M. (2026). A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models. arXiv:2605.08513. https://doi.org/10.48550/arXiv.2605.08513

9. Leo XIII. (1891). Rerum Novarum: Encyclical Letter on Capital and Labor. Vatican: Libreria Editrice Vaticana. https://www.vatican.va/content/leo-xiii/en/encyclicals/documents/hf_l-xiii_enc_15051891_rerum-novarum.html

10. Leo XIV. (2026). Magnifica Humanitas: Encyclical Letter on Safeguarding the Human Person in the Time of Artificial Intelligence. Vatican: Libreria Editrice Vaticana. https://www.vatican.va/content/leo-xiv/en/encyclicals/documents/20260515-magnifica-humanitas.html

11. Schoene, A. M., & Canca, C. (2025). ‘For Argument’s Sake, Show Me How to Harm Myself!’: Jailbreaking LLMs in Suicide and Self-Harm Contexts. arXiv:2507.02990. https://doi.org/10.48550/arXiv.2507.02990

12. Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity. Apple Machine Intelligence Research. arXiv:2506.06941. https://doi.org/10.48550/arXiv.2506.06941

13. Umbrello, S. (2024). Bernard Lonergan and a Nouvelle théologie for Artificial Intelligence. The Lonergan Review, 14, 13-44. https://doi.org/10.5840/lonerganreview2024/2025142

14. Umbrello, S., & van de Poel, I. (2021). Mapping value sensitive design onto AI for social good principles. AI and Ethics, 1(3), 283–296. https://doi.org/10.1007/s43681-021-00038-3