Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Coercion and Extortion

Advanced AI systems might also lead to various forms of coercion and extortion in less extreme settings (Ellsberg, 1968; Harrenstein et al., 2007). These threats might target humans directly (such as the revelation of private information extracted by advanced AI surveillance tools), or other AI systems that are deployed on behalf of humans (such as by hacking a system to limit its resources or operational capacity; see also Section 3.7). Increasing AI cyber-offensive capabilities – including those that target other AI systems via adversarial attacks and jailbreaking (Gleave et al., 2020; Yamin et al., 2021; Zou et al., 2023) – without a commensurate increase in defensive capabilities could make this form of conflict cheaper, more widespread, and perhaps also harder to detect (Brundage et al., 2018). Addressing these issues requires design strategies that prevent AI systems from exploiting, or being susceptible to, such coercive tactics.

Source: MIT AI Risk Repositorymit1213

ENTITY

2 - AI

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit1213

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Implement Secure-by-Design (SbD) principles across the entire Machine Learning Operations (MLOps) lifecycle, ensuring security is a foundational architectural requirement, not an add-on. This mandates rigorous adherence to the Confidentiality, Integrity, and Availability (CIA) triad by enforcing robust access controls, data provenance tracing, and model signing to prevent the exploitation of sensitive data and system resources. 2. Employ continuous adversarial robustness testing and specialized defenses to mitigate AI-specific attack vectors. This involves mandatory AI Red Teaming to identify vulnerabilities to prompt injection, jailbreaking, and data poisoning, and the deployment of both prompt-level (e.g., filtering/perturbation) and model-level (e.g., SFT-based) safeguards to maintain output alignment and integrity. 3. Strategically increase commensurate defensive capacity by integrating advanced AI and Machine Learning (ML) tools into the security architecture. Utilize these systems for predictive threat modeling, real-time anomaly detection (e.g., UEBA), and automated incident response to effectively counter the speed and scale of AI-accelerated cyber-offensive capabilities.