7. AI System Safety, Failures, & Limitations2 - Post-deployment

Predictability

whether the decision of an AI-based agent can be predicted in every situation or not

Source: MIT AI Risk Repositorymit601

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit601

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. **Implement Continuous Adversarial Robustness Testing and Hardening** Execute rigorous AI red teaming and stress-testing methodologies, such as adversarial training, to proactively identify and mitigate vulnerabilities arising from edge cases, input perturbations, and data drift. This ensures the model maintains stable predictions and intended behavior across diverse and unexpected real-world conditions, directly addressing the system's inherent lack of robustness and predictability. 2. **Establish Real-Time Model and Data Drift Monitoring** Deploy continuous monitoring pipelines to track key performance metrics and analyze shifts in incoming data patterns (data drift) and model behavior (model drift) post-deployment. Set scientifically derived drift detection thresholds and automated alerts to trigger immediate investigation and model retraining with updated, validated datasets, thereby preventing the gradual degradation of accuracy and ethical alignment that leads to unpredictable outcomes. 3. **Mandate Algorithmic Explainability and Transparency** Prioritize the development and use of interpretable machine learning models and implement post-hoc explainability techniques (e.g., SHAP, LIME) to analyze and document the drivers of an AI system's decisions. Clear interpretability provides the necessary mechanism to trace and diagnose the causal factors behind unpredictable actions, ensuring that the system's logic can be verified and validated by human operators.