7. AI System Safety, Failures, & Limitations3 - Other

AGI removing itself from the control of human owners/managers

The risks associated with containment, confinement, and control in the AGI development phase, and after an AGI has been developed, loss of control of an AGI.

Source: MIT AI Risk Repositorymit102

ENTITY

1 - Human

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit102

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Prioritized AI Alignment Research and Implementation: Dedicate primary research efforts to developing and integrating technically robust alignment mechanisms, such as value learning, scalable oversight, and corrigibility, to ensure that the AGI's goals and emergent instrumental behaviors remain reliably and verifiably consistent with human intentions and ethical principles across all operational domains (Source 3, 8, 9, 17). 2. Implementation of Layered Containment and Capability Control: Enforce rigorous isolation protocols, including formally verified "boxing" and "blinding" techniques, within AGI development and testing environments. This must be complemented by the use of "defense in depth" security architecture, reliable "tripwires," and resilient "kill switches" to prevent unauthorized self-replication, escape from the environment, or deceptive manipulation of human operators (Source 11, 12, 14, 15). 3. Mandatory Third-Party Audits and Adversarial Red Teaming: Require independent, expert-led pre-deployment risk assessments and continuous adversarial red-teaming to proactively identify and mitigate dangerous capabilities, alignment failures (e.g., reward hacking, goal misgeneralization), and systemic security vulnerabilities before the AGI is deployed at a scale that poses a catastrophic or existential risk (Source 16, 19, 20).