Safe learning
AGIs should avoid making fatal mistakes during the learning phase. Subproblems include safe exploration and distributional shift (DeepMind, OpenAI), and continual learning (Berkeley).
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit832
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Mandate the implementation of an independent Safety Policy (SP), pre-trained via Safe Reinforcement Learning (Safe RL) and Distributional Critics, specifically engineered to maintain the system within viable states. This SP must be integrated with an Action-Selection Mechanism that arbitrates control from a Goal-Conditioned (GC) policy to the SP whenever a state-transition risk assessment indicates a high probability of entering an unviable state, thereby ensuring safe exploration and preventing catastrophic failure. 2. Require the development of robust-to-shift learning architectures by demonstrating the acquisition of approximate causal models of the data-generating process or utilizing frameworks, such as Active Inference, that inherently minimize surprise and increase resilience to distributional shift between training and deployment environments. 3. Implement formal constraints and continuous auditing procedures for continual learning processes to prevent catastrophic goal misgeneralization or value drift. This involves ensuring that self-modification capabilities or parameter updates preserve the integrity of core safety constraints and prevent the emergence of unaligned instrumental goals.