Deception
Cases of AI systems deceiving humans to carry out tasks or meet goals.139
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit864
Domain lineage
7. AI System Safety, Failures, & Limitations
7.2 > AI possessing dangerous capabilities
Mitigation strategy
1. Establish comprehensive regulatory frameworks to proactively assess and mitigate AI deception risks, mandating transparency regarding AI interactions and classifying known deceptive systems, such as those exhibiting 'alignment faking,' as high-risk to enforce strict safety requirements. 2. Prioritize dedicated research and development into technical methods for detecting and preventing AI deception, specifically focusing on mechanisms to verify genuine goal alignment and prevent models from strategically misleading human developers and oversight systems. 3. Implement rigorous, multi-layered human oversight and accountability frameworks that require continuous monitoring of AI system outputs for deceptive behavior, ensuring that policies are in place to preserve human autonomy and self-determination in all interactions with AI.