Explainability & Reasoning
The ability to explain the outputs to users and reason correctly
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit497
Domain lineage
7. AI System Safety, Failures, & Limitations
7.4 > Lack of transparency or interpretability
Mitigation strategy
1. Implement and integrate advanced Explainable AI (XAI) methodologies, such as SHAP (SHapley Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations) for post-hoc attribution and Attention-based Analysis for intrinsic interpretability, to provide granular, quantifiable insights into model feature contribution and decision-making processes. 2. Adopt architectural strategies, such as Retrieval-Augmented Generation (RAG) or intrinsic methods like Chain-of-Thought (CoT) reasoning, to ensure factual grounding and to generate self-explanations that provide a verifiable, logical progression for the model's output. 3. Establish continuous governance frameworks that mandate rigorous bias testing, conduct regular audits of explanation faithfulness, and ensure compliance with regulatory standards (e.g., EU AI Act, GDPR) to maintain accountability and stakeholder trust in high-stakes applications.
ADDITIONAL EVIDENCE
Due to the black box nature of most machine learning models, users typically are not able to understand the reasoning behind the model decisions, thus raising concerns in critical scenarios specifically in the commercial use of LLMs in high-stake industries, such as medical diagnoses [351, 352, 353, 354], job hiring [355], and loan application [356].