Safety Risks from Affordances Provided to LLM-agents
The capabilities of LLM-agents can be enhanced in significant ways by providing the LLM-agent with novel affordances, e.g. the ability to browse the web (Nakano et al., 2021), to manipulate objects in the physical world (Ahn et al., 2022; Huang et al., 2022a), to create and instruct copies of itself (Richards, 2023), to create and use new tools (Wang et al., 2023a), etc. Affordances can create additional risks, as they often increase the impact area of the language-agent, and they amplify the consequences of an agent’s failures and enable novel forms of failure modes (Ruan et al., 2023; Pan et al., 2024).
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit1483
Domain lineage
7. AI System Safety, Failures, & Limitations
7.2 > AI possessing dangerous capabilities
Mitigation strategy
1. **Implement Granular Agency Control and Scoping** Enforce the principle of least privilege by strictly limiting the LLM agent's affordances (functionality and permissions) to the minimum set required for its explicit, defined objective. This must include building in external mediation systems, which enforce authorization checks on all action requests to downstream physical or virtual environments, rather than relying on the agent's internal decision-making for permission validation. 2. **Deploy Affordance-Aware Safety Alignment Frameworks** Utilize advanced alignment techniques, such as those that intervene during the multi-step reasoning process, to proactively assess the logical implications and potential for harm associated with a planned action or tool use. This framework must be capable of recognizing and mitigating the implicit, emergent safety risks introduced by complex affordances (e.g., physical manipulation or self-replication) to prevent the propagation of errors before an action is executed. 3. **Establish Proportional Control Evaluation and Monitoring** Institute a continuous security lifecycle that includes real-time behavioral monitoring and anomaly detection to flag suspicious activities or deviations from guardrails. Crucially, conduct regular, adversarial red teaming exercises where the testing agent's afforded capabilities are adapted to match the assessed capability profile of the deployed LLM agent, ensuring that control measures are robust and proportionally aligned with the risk profile.