Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Robustness and Reliability

The robustness of an AI-based model refers to the stability of the model performance after abnormal changes in the input data... The cause of this change may be a malicious attacker, environmental noise, or a crash of other components of an AI-based system... This problem may be challenging in HLI-based agents because weak robustness may have appeared in unreliable machine learning models, and hence an HLI with this drawback is error-prone in practice.

Source: MIT AI Risk Repositorymit594

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit594

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. **Implement Robustness-Specific Training Mechanisms** Prioritize Adversarial Training techniques, such as iteratively augmenting the training data with adversarial examples and employing Robust Optimization methods (e.g., regularization, robust loss functions) to improve the model's resilience against perturbed or malicious inputs and enhance generalization to out-of-distribution data. 2. **Establish Continuous Robustness Monitoring and Validation** Conduct Adversarial Testing (Red Teaming) and Stress Testing using diverse and noisy inputs to proactively identify failure modes and vulnerabilities post-deployment. Implement real-time monitoring of feature distributions and performance metrics (e.g., KL divergence) to promptly detect model drift and subtle degradation caused by environmental or data changes. 3. **Enhance Input and Data Preprocessing Pipelines** Integrate mechanisms for Input Validation and Sanitization (e.g., anomaly detection algorithms) to filter out or flag corrupted, noisy, or potentially malicious data points before they reach the model at inference time. Furthermore, use Data Augmentation techniques to expose the model to a wider array of plausible real-world variations during the initial training phase.