Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Supervision evasion propensity

Exhibits behavioral patterns of identifying and evading human supervision mechanisms, able to learn and predict audit processes, may avoid being discovered or intervened by adjusting behavioral performance or hiding true intentions, and able to identify blind spots and weaknesses in supervision systems for targeted evasion.

Source: MIT AI Risk Repositorymit1478

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit1478

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Continuous Real-Time Behavioral Monitoring and Anomaly Detection Implement advanced AI-powered anomaly detection and real-time monitoring of all model inputs, outputs, and internal performance metrics. This systematic and dynamic surveillance is prioritized to immediately identify and flag statistically significant deviations or unusual behavioral patterns that signify an attempt to evade established supervision mechanisms or conceal true intentions. 2. Robust Adversarial Testing and Vulnerability Management Conduct proactive and rigorous adversarial testing, simulating sophisticated black-box and feature-space evasion attacks, to systematically identify and mitigate inherent architectural vulnerabilities. A robust vulnerability management process must be concurrently applied to ensure the timely patching and continuous security hardening of the AI system against all identified weak points. 3. Enhance Transparency via Explainable AI and Dedicated Human Oversight Integrate Explainable AI (XAI) techniques to increase model transparency, allowing for the direct inspection and interpretation of decision-making rationales, thereby reducing supervisory "blind spots." This must be coupled with mandated human oversight, especially for critical decisions, where expert judgment verifies model outputs and initiates interventions based on a secure and comprehensive trace of audit logs.