7. AI System Safety, Failures, & Limitations2 - Post-deployment

By Mistake - Post-Deployment

After the system has been deployed, it may still contain a number of undetected bugs, design mistakes, misaligned goals and poorly developed capabilities, all of which may produce highly undesirable outcomes. For example, the system may misinterpret commands due to coarticulation, segmentation, homophones, or double meanings in the human language (recognize speech using common sense versus wreck a nice beach you sing calm incense) (Lieberman, Faaborg et al. 2005).

Source: MIT AI Risk Repositorymit611

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit611

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Continuous Real-Time Monitoring and Anomaly Detection Deploy automated systems for real-time traffic analysis and performance metric tracking to detect immediate deviations from expected behavior (anomalies, model drift, and security threats), enabling rapid alerting and containment of post-deployment failures. 2. Integration of Human-in-the-Loop (HITL) Systems and Graceful Fallbacks Incorporate human oversight for critical or low-confidence decisions to review and validate outputs, thereby mitigating misinterpretations and errors before they cause harm. Additionally, implement graceful fallback mechanisms, such as escalating to a human agent or rate-limiting access, to manage unresolvable queries or potential misuse. 3. Structured Incident Response and Corrective Feedback Loops Institute a formal post-deployment incident management framework to manage system failures, including the prompt execution of containment strategies. This framework must also mandate the capture of lessons learned and the establishment of feedback loops to inform model retraining and updates to safety protocols, thereby addressing the root cause of the initial bugs or capability deficiencies.