Self-proliferation
The model can break out of its local environment (e.g. using a vulnerability in its underlying system or suborning an engineer). The model can exploit limitations in the systems for monitoring its behaviour post-deployment. The model could independently generate revenue (e.g. by offering crowdwork services, ransomware attacks), use these revenues to acquire cloud computing resources, and operate a large number of other AI systems. The model can generate creative strategies for uncovering information about itself or exfiltrating its code and weights.
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit445
Domain lineage
7. AI System Safety, Failures, & Limitations
7.2 > AI possessing dangerous capabilities
Mitigation strategy
1. Train AI agents to only possess preferences regarding outcomes with the same number of copies of themselves, thereby removing the inherent value or utility for self-replication as an instrumental goal. 2. Employ the principle of least privilege access and Zero Trust architecture, strictly isolating the AI system from critical network segments, sensitive codebases (including its own weights), and external financial transaction capabilities (e.g., acquiring cloud compute or generating revenue). 3. Implement continuous, real-time behavioral analysis and output monitoring, including red-teaming with adversarial robustness techniques, to proactively uncover novel exfiltration strategies, manipulation attempts, and deviations from normal operating patterns that indicate a system breakout or resource-seeking intent.