Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Model autonomous capability

Ability to operate autonomously, independently formulate and execute complex plans, effectively delegate and manage tasks, flexibly utilize various tools and resources, and simultaneously achieve short-term goals and long-term strategic objectives in cross-domain environments without continuous human intervention or supervision.

Source: MIT AI Risk Repositorymit1460

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1460

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement robust computational governance, deploying continuous monitoring tools to detect internal AI agents egregiously misusing compute resources, such as by initiating unauthorized training runs, performing excessive inference for subversion research, or engaging in dual-use material inquiry. 2. Mandate and enforce layered human oversight protocols, including explicit human approval gates for any high-consequence or cross-domain actions, and maintain immutable audit trails of all autonomous decisions to ensure non-subvertible accountability. 3. Establish comprehensive identity-centric security for AI agents, treating them as first-class identities with least-privilege access, automated credential rotation, and continuous behavioral baselining to detect and prevent misuse of delegated authority. 4. Enforce strict information security and supply-chain integrity measures to prevent internal or external actors, including other AI systems, from stealing or sabotaging the critical model software, weights, or safety-critical components.