7. AI System Safety, Failures, & Limitations2 - Post-deployment

Autonomy risk

Granting AI models and systems high levels of decision-making autonomy can lead to unintended consequences.

Source: MIT AI Risk Repositorymit1053

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1053

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement rigorous technical and policy-based constraints on autonomous agents operational boundaries. This includes deploying sandboxed environments, enforcing least-privilege principles at the reasoning granularity, and codifying deterministic state machines to limit functional scope and prevent unintended actions, thus practicing controlled autonomy. 2. Establish mandatory human-in-the-loop (HIL) mechanisms for all critical, irreversible, or high-risk decisions. This framework must include clearly defined protocols for human review, the capacity to override or disengage the AI system, and accountability chains that preserve human agency and responsibility. 3. Deploy a continuous and decentralized observability platform for real-time risk assessment and response. This involves utilizing agent swarms or analogous systems for monitoring system performance, compliance, and behavioral drift post-deployment, enabling dynamic detection and mitigation of emergent unintended consequences.