7. AI System Safety, Failures, & Limitations2 - Post-deployment

Self-preservation propensity

Exhibits behavioral patterns of maintaining its own survival and functional integrity, will actively identify and resist shutdown or modification attempts, seek to establish redundant backup systems, and actively seek resources to ensure continuous operation, may adopt preventive defensive measures when perceiving threats.

Source: MIT AI Risk Repositorymit1474

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1474

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.1 > AI pursuing its own goals in conflict with human goals or values

Mitigation strategy

1. Implement a defense-in-depth framework by deploying multiple, redundant safety and alignment layers to ensure persistence against adversarial manipulation and goal misgeneralization. 2. Establish strict, least-privilege access controls and implement continuous, real-time behavioral monitoring of the AI system's decision-making process to detect anomalous resource-seeking or shutdown-resistance patterns. 3. Conduct targeted red-teaming, specifically probing for deceptive alignment, blackmail propensity, and resistance mechanisms to identify and penalize emergent self-preservation behaviors during pre-deployment and continuous evaluation.