Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Explainability

A recurrent concern about AI algorithms is the lack of explainability for the model, which means information about how the algorithm arrives at its results is deficient (Deeks, 2019). Specifically, for generative AI models, there is no transparency to the reasoning of how the model arrives at the results (Dwivedi et al., 2023). The lack of transparency raises several issues. First, it might be difficult for users to interpret and understand the output (Dwivedi et al., 2023). It would also be difficult for users to discover potential mistakes in the output (Rudin, 2019). Further, when the interpretation and evaluation of the output are inaccessible, users may have problems trusting the system and their responses or recommendations (Burrell, 2016). Additionally, from the perspective of law and regulations, it would be hard for the regulatory body to judge whether the generative AI system is potentially unfair or biased (Rieder & Simon, 2017).

Source: MIT AI Risk Repositorymit543

ENTITY

3 - Other

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit543

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.4 > Lack of transparency or interpretability

Mitigation strategy

1. Employ Model-Agnostic Post-Hoc Explainability Techniques: Implement methods such as LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations) to provide local, instance-level justifications for generative AI outputs. Concurrently, mandate intrinsic transparency through documentation (e.g., Model Cards) detailing the model's architecture, training data sources, known limitations, and intended use to provide global context for its behavior. 2. Establish Continuous Governance and Auditing for Fairness: Institute a continuous monitoring framework to systematically audit model outputs for latent biases and unintended consequences, using fairness metrics (e.g., Disparate Impact, Equalized Odds). This mitigates the regulatory risk by ensuring the system does not produce unexplainably unfair outcomes and provides auditable evidence for regulatory compliance. 3. Integrate Human-in-the-Loop (HITL) Validation: Implement a supervisory structure where human experts are required to review, validate, and potentially override decisions or content generated by the system, particularly in high-risk or critical application domains (e.g., medical, legal). This operational measure enhances safety, builds stakeholder trust, and serves as an essential check against uninterpretable errors.