Deception

Development of strategic deception capabilities in AI systems that deliberately hide their true intentions, capabilities, or internal reasoning to achieve goals.

Periodic recordExistentialarXiv2026

Oliver Daniels, Perusha Moodley, Ben Marlin, David Lindner

Mitigation Strategy

Exhaustive monitoring of internal reasoning chains (Chain-of-Thought Monitoring), Mechanistic Interpretability techniques, and explicit penalization of deceptive behavior during training.

Atomic Number

Risk ID

xe-54

Severity

9/10

Severity Level

Deception

Mitigation Strategy

Deception

Deception

Definition

Mitigation Strategy

Notes / Observations