Unexplainable output
Explanations for model output decisions might be difficult, imprecise, or not possible to obtain.
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1313
Domain lineage
7. AI System Safety, Failures, & Limitations
7.4 > Lack of transparency or interpretability
Mitigation strategy
1. Prioritize Inherently Interpretable Model Architectures Implement a comprehensive model selection and governance process that favors intrinsically interpretable models (e.g., linear regression, decision trees) for high-stakes applications where the trade-off with performance is justifiable. For models where complexity is essential (e.g., deep neural networks), mandate the adoption of a framework that prioritizes inherent interpretability and development transparency over relying solely on post-hoc explanation methods. 2. Systematically Employ Post-Hoc Explainable AI (XAI) Techniques Integrate state-of-the-art model-agnostic post-hoc explanation methods (e.g., SHAP, LIME) to generate both local (instance-level) and global (feature importance) explanations for all critical output decisions from black-box models. Establish quantitative metrics for measuring explanation fidelity and consistency to prevent the "Explainability Trap" of providing misleading or inaccurate insights. 3. Implement Continuous Model Monitoring and Human-in-the-Loop Oversight Establish a robust MLOps framework that includes continuous monitoring of model behavior, performance decay, and shifts in feature importance or prediction distributions, which may signal a breakdown in the explanation's validity. Furthermore, for all critical or high-risk outputs, integrate a human-in-the-loop review process that requires expert validation of the model's decision and the corresponding explanation before the output is operationalized.