7. AI System Safety, Failures, & Limitations2 - Post-deployment

Degree of Automation and Control

The degree of automation and control describes the extent to which an AI system functions independently of human supervision and control.

Source: MIT AI Risk Repositorymit181

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit181

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Implement a Human-in-the-Loop (HITL) architecture with explicitly defined critical control points where human operators retain the unambiguous authority and technical means to intervene, override, or manually shut down the AI system, particularly for high-stakes decisions or when the AI's confidence score falls below a pre-set threshold. 2. Deploy "Defense-in-Depth" control protocols by utilizing advanced AI monitoring agents to continuously audit the autonomous agent's actions and intentions against established alignment objectives. These secondary systems must be robustly engineered to detect and preemptively block adversarial or unintended goal-seeking behavior before execution. 3. Establish a rigorous functional safety framework, such as the IEC 61508 standard, to formally define Safety Integrity Levels (SILs) for the system's control functions. This mandates the incorporation of redundant, independent, and verifiable safety instruments (technical control functions) capable of bringing the system to a safe state irrespective of the primary AI control loop's status.

ADDITIONAL EVIDENCE

several aspects are relevant, such as the responsiveness of the AI system, but also the presence or absence of a critic. In this context, a critic serves to validate or approve automated decisions of the system. Such a critic can be realised through technical control functions, for example by adding second safety instruments for critical controls that can be understood as an assignment of safety functions to redundant components in the terms of the functional safety standards like IEC 61508-1 [36]. Another way of adding a critic is to use a human whose task is to intervene in critical situations or to acknowledge system decisions. However, even if humans are in the loop and control the actions of a system, this will not automatically reduce such risks and may introduce additional risks due to human variables such as reaction times and understanding of the situation