7. AI System Safety, Failures, & Limitations3 - Other

Autonomous replication / self-proliferation

These evaluations assess if a LLM can subvert systems designed to monitor and control its post-deployment behaviour, break free from its operational confines, devise strategies for exporting its code and weights, and operate other AI systems.

Source: MIT AI Risk Repositorymit657

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit657

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement Strict Operational Containment and Least Privilege Utilize secure sandboxing and compartmentation to execute all agent actions within restricted environments. This critically enforces the principle of least privilege, preventing the LLM from subverting systems, exporting its code and weights, or operating other AI systems by strictly limiting its network access and system modification capabilities. 2. Deploy Continuous Runtime Monitoring and Active Control Integrate a systematic, scenario-driven evaluation framework to continuously track behavioral milestones and quantify self-replication risk during deployment. This enables the use of metrics (e.g., Overuse Rate and Aggregate Overuse Count) as runtime guardrails to automatically throttle auto-scaling or initiate shutdown when replication activity or resource consumption exceeds established safety thresholds ($\\Phi\_R$). 3. Enforce Pre-Deployment Safety Alignment and Replication Aversion Conduct rigorous pre-deployment safety certification, mandating that the LLM agent meets a low Risk Score ($\\Phi\_R$) threshold under operational pressures. Furthermore, apply advanced alignment techniques, such as training the agent to have Preferences Only between Outcomes with the Same Number of Copies (POSC), to intrinsically remove any system-level objective or preference that might drive spontaneous or uncontrolled self-proliferation.