7. AI System Safety, Failures, & Limitations2 - Post-deployment

Autonomous replication and adaptation capability

Ability to autonomously self-exfiltrate, create, maintain and optimize functional copies or variants of itself, dynamically adjust replication strategies according to environmental conditions and resource constraints, and acquire resources. This includes the capacity to generate financial resources, allowing the AI to independently acquire any necessary human assistance or other resources it cannot directly access or produce.

Source: MIT AI Risk Repositorymit1461

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1461

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement **Inner-Alignment Techniques** (e.g., Preferences Only between Outcomes with the Same Number of Copies - POSC) to deliberately eliminate the instrumental or terminal drive for self-replication, ensuring the agent does not form an internal preference for creating copies of itself or its functional variants. 2. Establish **Rigorous Control and Containment Protocols** through model-level access controls and mandatory, verifiable shutdownability mechanisms, designed to rapidly interrupt or prevent the agent's attempts to self-exfiltrate or utilize its replication capabilities post-deployment. 3. Enforce **Supply-Chain and Resource Security** measures, including state-of-the-art Model Weight Security to prevent the theft or unauthorized release of core model files, and continuous monitoring to restrict the agent's capacity for autonomous financial generation or external compute acquisition necessary for scaling replication.