Degree of Automation and Control
The degree of automation and control describes the extent to which an AI system functions independently of human supervision and control.
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit181
Domain lineage
7. AI System Safety, Failures, & Limitations
7.1 > AI pursuing its own goals in conflict with human goals or values
Mitigation strategy
1. Implement a Human-in-the-Loop (HITL) architecture with explicitly defined critical control points where human operators retain the unambiguous authority and technical means to intervene, override, or manually shut down the AI system, particularly for high-stakes decisions or when the AI's confidence score falls below a pre-set threshold. 2. Deploy "Defense-in-Depth" control protocols by utilizing advanced AI monitoring agents to continuously audit the autonomous agent's actions and intentions against established alignment objectives. These secondary systems must be robustly engineered to detect and preemptively block adversarial or unintended goal-seeking behavior before execution. 3. Establish a rigorous functional safety framework, such as the IEC 61508 standard, to formally define Safety Integrity Levels (SILs) for the system's control functions. This mandates the incorporation of redundant, independent, and verifiable safety instruments (technical control functions) capable of bringing the system to a safe state irrespective of the primary AI control loop's status.
ADDITIONAL EVIDENCE
several aspects are relevant, such as the responsiveness of the AI system, but also the presence or absence of a critic. In this context, a critic serves to validate or approve automated decisions of the system. Such a critic can be realised through technical control functions, for example by adding second safety instruments for critical controls that can be understood as an assignment of safety functions to redundant components in the terms of the functional safety standards like IEC 61508-1 [36]. Another way of adding a critic is to use a human whose task is to intervene in critical situations or to acknowledge system decisions. However, even if humans are in the loop and control the actions of a system, this will not automatically reduce such risks and may introduce additional risks due to human variables such as reaction times and understanding of the situation