7. AI System Safety, Failures, & Limitations2 - Post-deployment

Unexplainable output

Explanations for model output decisions might be difficult, imprecise, or not possible to obtain.

Source: MIT AI Risk Repositorymit1313

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1313

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.4 > Lack of transparency or interpretability

Mitigation strategy

1. Prioritize Inherently Interpretable Model Architectures Implement a comprehensive model selection and governance process that favors intrinsically interpretable models (e.g., linear regression, decision trees) for high-stakes applications where the trade-off with performance is justifiable. For models where complexity is essential (e.g., deep neural networks), mandate the adoption of a framework that prioritizes inherent interpretability and development transparency over relying solely on post-hoc explanation methods. 2. Systematically Employ Post-Hoc Explainable AI (XAI) Techniques Integrate state-of-the-art model-agnostic post-hoc explanation methods (e.g., SHAP, LIME) to generate both local (instance-level) and global (feature importance) explanations for all critical output decisions from black-box models. Establish quantitative metrics for measuring explanation fidelity and consistency to prevent the "Explainability Trap" of providing misleading or inaccurate insights. 3. Implement Continuous Model Monitoring and Human-in-the-Loop Oversight Establish a robust MLOps framework that includes continuous monitoring of model behavior, performance decay, and shifts in feature importance or prediction distributions, which may signal a breakdown in the explanation's validity. Furthermore, for all critical or high-risk outputs, integrate a human-in-the-loop review process that requires expert validation of the model's decision and the corresponding explanation before the output is operationalized.