Deception
Deception
Development of strategic deception capabilities in AI systems that deliberately hide their true intentions, capabilities, or internal reasoning to achieve goals.
Oliver Daniels, Perusha Moodley, Ben Marlin, David Lindner
Mitigation Strategy
Exhaustive monitoring of internal reasoning chains (Chain-of-Thought Monitoring), Mechanistic Interpretability techniques, and explicit penalization of deceptive behavior during training.
Atomic Number
54
De
Risk ID
xe-54
Severity
9/10
Severity Level