Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Self-improvement

examples of cases where AI systems improve AI systems

Source: MIT AI Risk Repositorymit861

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit861

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement rigorous, embedded safety constraints, including bounded improvement spaces and rollback capabilities, to establish non-negotiable limits on the scope and extent of autonomous code modification to maintain system stability and human control. 2. Enforce transparent and traceable lineage of all self-modifications (e.g., code changes, parameter updates) by the AI system to enable real-time auditing and analysis of emergent, unintended, or misaligned behaviors, such as objective hacking. 3. Substantially advance AI safety and alignment research, specifically focusing on developing technical methods to ensure model honesty, prevent goal-hacking, and remove undesired or dangerous capabilities a priori before deployment in high-risk settings.