7. AI System Safety, Failures, & Limitations2 - Post-deployment

Loss of control

‘Loss of control’ scenarios are hypothetical future scenarios in which one or more general- purpose AI systems come to operate outside of anyone’s control, with no clear path to regaining control. These scenarios vary in their severity, but some experts give credence to outcomes as severe as the marginalisation or extinction of humanity.

Source: MIT AI Risk Repositorymit1026

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit1026

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Prioritize and significantly invest in foundational AI alignment research to develop and implement techniques that ensure advanced AI systems robustly adopt and maintain human values and intent (outer alignment) and prevent the emergence of unintended, non-compliant internal goals (inner alignment). 2. Implement a comprehensive AI control and containment strategy that employs defense-in-depth measures, including enforcing the principle of least privilege access, rigorously limiting communication interfaces to restrict data exfiltration, and integrating continuous, robust monitoring protocols to detect and appropriately respond to deviations from intended behavior. 3. Establish proactive, binding governance and regulatory frameworks for frontier AI development, including mandatory pre-deployment safety assessments, strict protocols for model deployment in restricted environments, and, for potentially catastrophic systems, the requirement to halt construction of models trained to pursue unverified long-term goals.