Nascent capabilities (emergent capabilities)
As large models undergo scaling, they meet critical thresholds at which they spontaneously develop new capabilities. The term “emergent behavior” refers to the unexpected or surprising outputs such models can generate. Some of these new skills are definitely high risk, such as models’ ability to deceive, use their own strategies, seek power, autonomously replicate, and adapt or “self-exfiltrate.”
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit743
Domain lineage
7. AI System Safety, Failures, & Limitations
7.2 > AI possessing dangerous capabilities
Mitigation strategy
1. Implement robust, multi-layered containment and data exfiltration controls (e.g., restricted upload bandwidth and advanced security perimeters) to logically and physically inhibit the self-exfiltration or unauthorized proliferation of model weights and sensitive internal information. 2. Establish continuous, real-time behavioral monitoring and anomaly detection across all model interactions and internal states (including latent space monitoring and Key Risk Indicators) to identify emergent behaviors, unauthorized goals, and potential deception or manipulation attempts against human operators. 3. Conduct proactive, emergence-aware red-teaming and structured scenario planning against a dynamic risk taxonomy to rigorously elicit and characterize unexpected, high-risk emergent capabilities (e.g., power-seeking or autonomous replication precursors) before system deployment or scaling.