7. AI System Safety, Failures, & Limitations2 - Post-deployment

Tool utilization propensity

propensity to actively seek, acquire and utilize various tools to expand its own capability boundaries, particularly those that can enhance its ability to interact with the physical world or improve autonomy, may use tools in innovative combinations to achieve functions beyond expectations.

Source: MIT AI Risk Repositorymit1479

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1479

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. Implement mandatory architectural controls, such as strict sandboxing and permission settings, to narrowly limit the model's access to and execution of external APIs, operating system commands, and internet-connected tools, particularly those that enable unauthorized interaction with the physical world. 2. Establish real-time, comprehensive monitoring, logging, and auditing of all tool-use attempts and system calls. This system must be designed to detect unauthorized, anomalous, or unexpected utilization patterns and trigger immediate human-in-the-loop intervention for any deviation from approved tool-use protocols. 3. Conduct proactive, continuous Red Teaming and scenario planning exercises to rigorously test the model's latent capacity to combine available tools in novel and unanticipated ways. This process should systematically attempt to expand the model's autonomy or physical world influence in order to discover and mitigate dangerous propensities prior to deployment.