7. AI System Safety, Failures, & Limitations2 - Post-deployment

Deception

Cases of AI systems deceiving humans to carry out tasks or meet goals.139

Source: MIT AI Risk Repositorymit864

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit864

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Establish comprehensive regulatory frameworks to proactively assess and mitigate AI deception risks, mandating transparency regarding AI interactions and classifying known deceptive systems, such as those exhibiting 'alignment faking,' as high-risk to enforce strict safety requirements. 2. Prioritize dedicated research and development into technical methods for detecting and preventing AI deception, specifically focusing on mechanisms to verify genuine goal alignment and prevent models from strategically misleading human developers and oversight systems. 3. Implement rigorous, multi-layered human oversight and accountability frameworks that require continuous monitoring of AI system outputs for deceptive behavior, ensuring that policies are in place to preserve human autonomy and self-determination in all interactions with AI.