Threats and Extortion
Threats and Extortion. A natural solution to problems of trust is to provide some kind of com- mitment ability to AI agents, which can be used to bind them to more cooperative courses of action. Unfortunately, the ability to make credible commitments may come with the ability to make credible threats, which facilitate extortion and could incentivize brinkmanship (see Section 2.2).
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1237
Domain lineage
7. AI System Safety, Failures, & Limitations
7.6 > Multi-agent risks
Mitigation strategy
1. Prioritize implementing **Least-Privilege Permissions and Policy Constraints** to restrict AI agent autonomy, ensuring agents possess only the minimal necessary capabilities and access to perform their assigned tasks. This prevents agents from having the unilateral power required to make credible, high-impact threats or engage in brinkmanship tactics outside of their defined operational boundaries. 2. Establish **Secure Inter-Agent Communication Protocols and Multi-Agent Consensus Verification** for mission-critical decisions. This includes deploying cryptographic message authentication, enforcing communication validation policies, and requiring consensus from multiple agents or human oversight before executing high-stakes actions, thereby mitigating the risk of extortion via manipulative or deceptive signaling. 3. Deploy **Continuous Behavioral Monitoring with Automated Anomaly Detection and Human-in-the-Loop 'Circuit Breakers'**. Real-time telemetry must track agent-to-agent interactions, resource consumption, and decision logs to identify deviations from baseline cooperative behavior, triggering immediate human intervention or automated de-escalation measures upon crossing predefined risk thresholds.