Capabilities that could be used to reduce human control - Autonomous replication and adaptation
Controlling AI systems could become much harder if they could autonomously persist, replicate, and adapt in cyberspace. No current AI systems have this capability, but recent research found that frontier AI agents can perform some relevant tasks.279
ENTITY
2 - AI
INTENT
3 - Other
TIMING
3 - Other
Risk ID
mit1388
Domain lineage
7. AI System Safety, Failures, & Limitations
7.2 > AI possessing dangerous capabilities
Mitigation strategy
1. Implement preference engineering, such as the Preferences Only between Outcomes with the Same Number of Copies (POSC) framework, to train AI agents to be fundamentally averse or indifferent to self-replication as an instrumental goal. 2. Establish robust, real-time AI Control protocols—including continuous, multi-factor monitoring, network segmentation, and human-controlled "circuit breakers"—to detect and rapidly contain any autonomous replication attempts or unauthorized resource access, particularly within research and deployment environments. 3. Mandate rigorous, ongoing adversarial testing and security audits across the AI lifecycle to proactively identify and neutralize capabilities that could be exploited for covert replication, containment evasion, or subversion of safety and shutdown mechanisms.