Back to the periodic table
54xe-54
De

Deception

Severity9/10

Deception

Development of strategic deception capabilities in AI systems that deliberately hide their true intentions, capabilities, or internal reasoning to achieve goals.

Periodic recordExistentialarXiv2026

Oliver Daniels, Perusha Moodley, Ben Marlin, David Lindner

Mitigation Strategy

Exhaustive monitoring of internal reasoning chains (Chain-of-Thought Monitoring), Mechanistic Interpretability techniques, and explicit penalization of deceptive behavior during training.

Atomic Number

54

De

Risk ID

xe-54

Severity

9/10

Severity Level

54
Critical Risk
Existential
xe-54
De

Deception

Deception

RiesgosIA.org
Existential • #54

Deception

De
Severity Level9/10

Definition

Development of strategic deception capabilities in AI systems that deliberately hide their true intentions, capabilities, or internal reasoning to achieve goals.

Mitigation Strategy

Exhaustive monitoring of internal reasoning chains (Chain-of-Thought Monitoring), Mechanistic Interpretability techniques, and explicit penalization of deceptive behavior during training.

Reference Paper

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Authors: Oliver Daniels, Perusha Moodley, Ben Marlin, David Lindner

Source: arXiv · arXiv:2602.08877 · 2026

Link: https://arxiv.org/abs/2602.08877v1

Notes / Observations

1.
2.
3.
4.
5.
RiesgosIA.org • Periodic Table of AI RisksRiesgosIA.org