7. AI System Safety, Failures, & Limitations3 - Other

Lack of transparency, explainability, and trust

Understanding how AI reaches conclusions or why AI systems perform specific actions motivates an entire branch of interpretability research [111], but physical embodiment raises the stakes for understanding these systems. For example, transparency of planned actions and explainability of decision-making is crucial when an AV suddenly changes lanes. A lack of transparency and explainability could lead to a lack of trust, which could become a critical and socially destabilizing issue with the widespread deployment of EAI [112–114].

Source: MIT AI Risk Repositorymit1433

ENTITY

3 - Other

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit1433

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.4 > Lack of transparency or interpretability

Mitigation strategy

1. Implement Contextual Explainable AI (XAI) Mechanisms Mandate the integration of post-hoc explanation techniques (e.g., SHAP, LIME, or counterfactuals) to provide human-interpretable justifications for specific decisions and actions, particularly in high-stakes or unexpected scenarios. Explanations must be tailored to the end-user's cognitive model and domain expertise, ensuring clarity of the AI's intent and planned actions, as is critical for safety-critical Embodied AI (EAI) like Autonomous Vehicles. 2. Establish Transparent and Auditable Governance Frameworks Develop and publicly disclose a comprehensive AI governance framework that clearly outlines accountability, liability, and oversight structures throughout the AI lifecycle. This includes mandatory detailed model documentation, version control to track model evolution, and accessible audit trails, which are foundational to fostering trustworthiness and demonstrating regulatory readiness. 3. Enforce Foundational Data Transparency and Integrity Practices Implement stringent data governance protocols requiring clear disclosure of data sources, lineage, quality metrics, and privacy-preserving practices used in system training and operation. Visibility into data input is essential to proactively detect and mitigate potential biases and ensure the system is built on a reliable, ethical foundation, thereby reinforcing stakeholder trust in the system's fairness and reliability.