Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Safety

Are AI safe with respect to human life and property? Will their use create unintended or intended safety issues?

Source: MIT AI Risk Repositorymit112

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit112

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Enforce Defense-in-Depth Safety Architecture and Value Alignment Integrate layered safety interventions into the AI system's design, employing techniques such as Reinforcement Learning from Human Feedback (RLHF) to align model behavior with core safety values and operational constraints, thereby mitigating risks from goal drift and unintended consequences (Source 5, 6). 2. Require Rigorous Robustness Validation and Capability Mapping Prior to deployment, conduct comprehensive adversarial testing, systematic probing of boundary conditions, and failure mode analysis to establish the limits of system reliability (Source 14, 4). Deployment in high-risk settings must be strictly conditional on empirical evidence demonstrating the system's capacity for graceful degradation and overall safety (Source 1, 14). 3. Implement Continuous Monitoring and an Adaptive Incident Response Framework Establish a real-time monitoring system with anomaly detection capabilities for performance and data integrity post-deployment (Source 9). This system must be coupled with an Incident Response Framework (Prepare, Monitor, Execute, Recover) to ensure swift and decisive containment of emerging threats, unexpected failures, or malicious misuse (Source 8).