7. AI System Safety, Failures, & Limitations2 - Post-deployment

Safety

This is the risk of direct or indirect physical or psychological injury resulting from interaction with the ML system.

Source: MIT AI Risk Repositorymit198

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit198

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

- **Implement Systemic Hazard Analysis and Formal Verification** Utilize methods such as System-Theoretic Process Analysis (STPA) to work backward from potential physical or psychological harms (losses) to establish critical safety constraints and requirements. Apply formal verification techniques, robust optimization, and worst-case bounds to align machine learning objectives with safety requirements and minimize epistemic uncertainty due to sparsity or out-of-distribution inputs. - **Incorporate Human-in-the-Loop (HITL) and Independent Redundancy** Design the system architecture to include a human-in-command or human-in-the-loop mechanism, ensuring the ability to assume or regain control promptly and override potentially unsafe or unintended automated actions. Furthermore, deploy an independent safety monitor or traditional backup system to provide continuous verification of the ML system's output and behavior. - **Establish Continuous Monitoring for Performance and Distributional Shift** Implement real-time performance threshold alerts and model drift detection mechanisms to identify degradations in model efficacy or shifts in input data distribution post-deployment. This continuous assurance process is critical for mitigating second-order risks that arise from the system's interaction with a complex, evolving environment.

ADDITIONAL EVIDENCE

By nature, ML systems take away some degree of control from their users when they automate certain tasks. Intuitively, this transfer of control should be accompanied by a transfer of moral responsibility for the user’s safety [143]. Therefore, a key concern around ML systems has been ensuring the physical and psychological safety of affected communities. In applications such as content moderation, keeping the system updated may involve the large-scale manual labeling and curation of toxic or graphic content by contract workers. Prolonged exposure to such content results in psychological harm, which should be accounted for when assessing the safety risk of these types of ML systems [134, 170]. First-order risks may lead to safety risk in different ways. For example, poor accuracy may lead to the system failing to recognize a pedestrian and running them over [33], a melanoma identifier trained on insufficiently diverse data may result in unnecessary chemotherapy [169], or swarming ML systems that endanger human agents (e.g., high-speed maneuvers via inter-vehicular coordination making traffic conditions dangerous for traditional vehicles) [196]. The inability to assume/regain control in time may also result in increased safety risk, (e.g, overriding an autonomous weapon before it mistakenly shoots a civilian) [68].