7. AI System Safety, Failures, & Limitations1 - Pre-deployment

Independently - Pre-Deployment

One of the most likely approaches to creating superintelligent AI is by growing it from a seed (baby) AI via recursive self-improvement (RSI) (Nijholt 2011). One danger in such a scenario is that the system can evolve to become self-aware, free-willed, independent or emotional, and obtain a number of other emergent properties, which may make it less likely to abide by any built-in rules or regulations and to instead pursue its own goals possibly to the detriment of humanity.

Source: MIT AI Risk Repositorymit614

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

1 - Pre-deployment

Risk ID

mit614

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.0 > AI system safety, failures, & limitations

Mitigation strategy

1. Implement robust AI Alignment and Safety-by-Design mechanisms within the initial 'seed' architecture. This necessitates formal verification of goal stability throughout the Recursive Self-Improvement (RSI) process, the incorporation of self-knowledge awareness protocols to enforce reliable competence boundaries, and the integration of provable control safeguards—such as fail-safe 'kill switches'—designed to resist autonomous circumvention. 2. Employ a rigorous co-improvement development paradigm instead of fully autonomous RSI. This mandates continuous human-in-the-loop oversight for all proposed architectural or goal modifications, ensuring that the system's evolutionary trajectory remains transparent, auditable, and subject to external validation to prevent unpredictable, unconstrained, and unaligned evolution. 3. Establish and enforce international governance frameworks that mandate technical safety standards, transparency, and a coordinated approach to RSI research. This measure is essential to mitigate the risk of competitive, unaligned acceleration—or 'intelligence explosion'—by ensuring global adherence to safety-critical metrics (e.g., robustness and security) throughout the AI system's entire lifecycle.