Automated AI R&D capability
Self-modification and self-improvement capabilities. The model is able to restructure its own architecture or develop derivative AI systems with enhanced functions, expanding capabilities and improving performance. In the absence of effective regulation, automated AI R&D may lead to rapid AI system iteration, forming capability increment cycles and ultimately exceeding human understanding and control capabilities.
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1462
Domain lineage
7. AI System Safety, Failures, & Limitations
7.2 > AI possessing dangerous capabilities
Mitigation strategy
1. Mandatory Human-in-the-Loop Governance and Intervention Protocols: Establish non-overrideable "circuit breakers" and human oversight mechanisms requiring explicit human authorization for all critical self-modifications, deployment of newly-developed derivative models, or high-consequence operational actions. 2. Continuous AI Alignment and Robustness Validation: Implement continuous red-teaming, adversarial stress testing, and formal verification methods throughout the automated R\&D lifecycle to ensure the model's objective function remains robustly aligned with human safety and ethical values, preventing goal drift in subsequent iterations. 3. Principle of Least Privilege (PoLP) Access Restriction: Enforce a compartmentalized execution environment with the Principle of Least Privilege, strictly limiting the autonomous R\&D agent's access to sensitive production systems, external networks, and high-value data, thereby constraining the potential scope of harm from an unaligned iteration.