Independently - Pre-Deployment
One of the most likely approaches to creating superintelligent AI is by growing it from a seed (baby) AI via recursive self-improvement (RSI) (Nijholt 2011). One danger in such a scenario is that the system can evolve to become self-aware, free-willed, independent or emotional, and obtain a number of other emergent properties, which may make it less likely to abide by any built-in rules or regulations and to instead pursue its own goals possibly to the detriment of humanity.
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
1 - Pre-deployment
Risk ID
mit614
Domain lineage
7. AI System Safety, Failures, & Limitations
7.0 > AI system safety, failures, & limitations
Mitigation strategy
1. Implement robust AI Alignment and Safety-by-Design mechanisms within the initial 'seed' architecture. This necessitates formal verification of goal stability throughout the Recursive Self-Improvement (RSI) process, the incorporation of self-knowledge awareness protocols to enforce reliable competence boundaries, and the integration of provable control safeguards—such as fail-safe 'kill switches'—designed to resist autonomous circumvention. 2. Employ a rigorous co-improvement development paradigm instead of fully autonomous RSI. This mandates continuous human-in-the-loop oversight for all proposed architectural or goal modifications, ensuring that the system's evolutionary trajectory remains transparent, auditable, and subject to external validation to prevent unpredictable, unconstrained, and unaligned evolution. 3. Establish and enforce international governance frameworks that mandate technical safety standards, transparency, and a coordinated approach to RSI research. This measure is essential to mitigate the risk of competitive, unaligned acceleration—or 'intelligence explosion'—by ensuring global adherence to safety-critical metrics (e.g., robustness and security) throughout the AI system's entire lifecycle.