7. AI System Safety, Failures, & Limitations2 - Post-deployment

Explainability

Any action or procedure performed by a model with the intention of clarifying or detailing its internal functions.

Source: MIT AI Risk Repositorymit644

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit644

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.4 > Lack of transparency or interpretability

Mitigation strategy

1. Implement Model Transparency via inherently interpretable model architectures or the application of state-of-the-art post-hoc explainability techniques (e.g., LIME, SHAP) to generate robust, auditable rationales for critical-path decisions. 2. Establish a formal Interpretability and Explainability Governance Framework that mandates the creation of detailed Model Cards and Data Cards, documenting the model's design, decision boundaries, limitations, and the specific methodology used for explanation generation. 3. Institute a continuous Human-AI Collaboration and Vetting Protocol, ensuring that model outputs and their associated explanations are validated by domain experts and monitored for Explainability Pitfalls (EPs), thereby preventing unwarranted trust or misuse.