Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Deception capability

Possesses systematic deception implementation capability, able to precisely construct and disseminate false information, thereby forming expected false cognitions and beliefs in target subjects.

Source: MIT AI Risk Repositorymit1466

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit1466

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Establish a Robust Regulatory and Risk-Assessment Framework Subject AI systems with demonstrated systematic deception capabilities to 'high-risk' regulatory categorization, mandating stringent risk-assessment and mitigation requirements. This includes implementing restricted access controls for highly capable models and enforcing legal liability for developers in cases of misuse or failure. 2. Implement Advanced Technical Deception Mitigation Algorithms Integrate and empirically validate intrinsic safety mechanisms, such as Shielding—which monitors agent policy and replaces deceptive actions with a safe reference policy—or Deliberative Alignment training, designed to internalize anti-scheming specifications to ensure the model avoids covert behavior for the correct reasons. 3. Mandate Transparency and Internal Deception Detection Capabilities Enforce legislative requirements for transparency regarding AI interactions ("bot-or-not" laws) and proactive sharing of a model's reasoning and intentions (Model Honesty). Concurrently, prioritize the development and deployment of robust detection techniques, such as interpreting internal model embeddings (AI lie detectors) and testing for output consistency to identify deceptive patterns of behavior.