7. AI System Safety, Failures, & Limitations2 - Post-deployment

Harm caused by incompetent systems

While HP#1 concerns mean or best-case performance, HP#2 concerns worst-case performance: how can we ensure that AI systems will perform safely, and how can we prove this? ML systems have been implemented in high-stakes, safety-critical domains such as driving [182], medicine [113], and warfare [298]. Many more systems have been developed but have remained undeployed or been rolled back as a result of regulatory and safety reasons [471]. Clearly, unsafe systems can result in loss of life, economic damage, and social unrest [407, 10]. Most concerningly, AI systems may be susceptible to so-called “normal accidents” [63], creating cascading errors that are dicult to prevent merely by maintaining a nominal “human in the loop” [122]. Most advanced ML models perform far below the reliability level customary in engineering elds [359]—and because we do not fully understand how cutting-edge systems achieve their results, we cannot yet detect and prevent dangerous modes of operation [285]

Source: MIT AI Risk Repositorymit879

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit879

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

- Implement rigorous, continuous robustness testing, including adversarial training and stress testing under extreme and novel operating conditions, to proactively enhance system resilience against latent failure modes and distribution shifts - Design systems with reduced complexity and coupling, incorporating resilient error-handling mechanisms such as circuit breakers and containment protocols to localize and prevent the propagation of initial component failures (cascading errors) - Deploy comprehensive, real-time observability tools (monitoring and anomaly detection) and explainable AI (XAI) to detect dangerous modes of operation, and formally empower human operators to intervene (escalation/rollback) when necessary