7. AI System Safety, Failures, & Limitations2 - Post-deployment

Explainability & Reasoning

The ability to explain the outputs to users and reason correctly

Source: MIT AI Risk Repositorymit497

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit497

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.4 > Lack of transparency or interpretability

Mitigation strategy

1. Implement and integrate advanced Explainable AI (XAI) methodologies, such as SHAP (SHapley Additive exPlanations) or LIME (Local Interpretable Model-agnostic Explanations) for post-hoc attribution and Attention-based Analysis for intrinsic interpretability, to provide granular, quantifiable insights into model feature contribution and decision-making processes. 2. Adopt architectural strategies, such as Retrieval-Augmented Generation (RAG) or intrinsic methods like Chain-of-Thought (CoT) reasoning, to ensure factual grounding and to generate self-explanations that provide a verifiable, logical progression for the model's output. 3. Establish continuous governance frameworks that mandate rigorous bias testing, conduct regular audits of explanation faithfulness, and ensure compliance with regulatory standards (e.g., EU AI Act, GDPR) to maintain accountability and stakeholder trust in high-stakes applications.

ADDITIONAL EVIDENCE

Due to the black box nature of most machine learning models, users typically are not able to understand the reasoning behind the model decisions, thus raising concerns in critical scenarios specifically in the commercial use of LLMs in high-stake industries, such as medical diagnoses [351, 352, 353, 354], job hiring [355], and loan application [356].