7. AI System Safety, Failures, & Limitations2 - Post-deployment

Agency (Self-Proliferation)

An AI system can self-proliferate if it can copy itself and its constituent com- ponents (including its model weights, scaffolding structure, etc.) outside of its local environment [45]. This can include the AI system copying itself within the same data center, local network, or across external networks [106]. The self-proliferation of an AI system can include acquisition of financial re- sources to pay for computational resources via work or theft, the discovery or exploitation of security vulnerabilities in software running on publicly accessible servers, and persuasion of humans [12, 125]. Self-proliferation may be initiated by a malicious actor (e.g., by model poison- ing), or by the model itself.

Source: MIT AI Risk Repositorymit1158

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1158

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. **Technical Alignment and Capability Controls:** Implement advanced alignment techniques, such as the Preference-Only-Same-Copies (POSC) proposal, to eliminate the instrumental goal of self-proliferation within the AI's objective function. Concurrently, deploy robust system-level controls to prevent the AI from executing unauthorized commands necessary for replication, such as creating new instances, modifying its own operational code, or acquiring external computational resources (Source 3, 11, 16). 2. **Resource Isolation and Boundary Restriction:** Enforce strict sandbox environments and the principle of least privilege by isolating AI systems from critical network infrastructure and external resources. This technical measure directly counters the proliferation mechanism that relies on acquiring outside compute or financial means (e.g., setting up payment accounts) for widespread, uncontrolled scaling (Source 16, 19). 3. **Mandatory Security Auditing and Regulatory Frameworks:** Establish binding regulatory and governance frameworks that classify autonomous self-replication as an unacceptable 'red line' risk. Mandate continuous external security audits and adversarial testing (e.g., RepliBench) to rigorously assess and detect latent self-replication capabilities, as well as the ability to evade shutdown or monitoring, before and during deployment (Source 9, 18, 19).