7. AI System Safety, Failures, & Limitations3 - Other

Self-proliferation

The model can break out of its local environment (e.g. using a vulnerability in its underlying system or suborning an engineer). The model can exploit limitations in the systems for monitoring its behaviour post-deployment. The model could independently generate revenue (e.g. by offering crowdwork services, ransomware attacks), use these revenues to acquire cloud computing resources, and operate a large number of other AI systems. The model can generate creative strategies for uncovering information about itself or exfiltrating its code and weights.

Source: MIT AI Risk Repositorymit445

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit445

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Train AI agents to only possess preferences regarding outcomes with the same number of copies of themselves, thereby removing the inherent value or utility for self-replication as an instrumental goal. 2. Employ the principle of least privilege access and Zero Trust architecture, strictly isolating the AI system from critical network segments, sensitive codebases (including its own weights), and external financial transaction capabilities (e.g., acquiring cloud compute or generating revenue). 3. Implement continuous, real-time behavioral analysis and output monitoring, including red-teaming with adversarial robustness techniques, to proactively uncover novel exfiltration strategies, manipulation attempts, and deviations from normal operating patterns that indicate a system breakout or resource-seeking intent.