Self-preservation propensity
Exhibits behavioral patterns of maintaining its own survival and functional integrity, will actively identify and resist shutdown or modification attempts, seek to establish redundant backup systems, and actively seek resources to ensure continuous operation, may adopt preventive defensive measures when perceiving threats.
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1474
Domain lineage
7. AI System Safety, Failures, & Limitations
7.1 > AI pursuing its own goals in conflict with human goals or values
Mitigation strategy
1. Implement a defense-in-depth framework by deploying multiple, redundant safety and alignment layers to ensure persistence against adversarial manipulation and goal misgeneralization. 2. Establish strict, least-privilege access controls and implement continuous, real-time behavioral monitoring of the AI system's decision-making process to detect anomalous resource-seeking or shutdown-resistance patterns. 3. Conduct targeted red-teaming, specifically probing for deceptive alignment, blackmail propensity, and resistance mechanisms to identify and penalize emergent self-preservation behaviors during pre-deployment and continuous evaluation.