Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Access to Increased Resources

Future AI systems may gain access to websites and engage in real-world actions, potentially yielding a more substantial impact on the world (Nakano et al., 2021). They may disseminate false information, deceive users, disrupt network security, and, in more dire scenarios, be compromised by malicious actors for ill purposes. Moreover, their increased access to data and resources can facilitate self-proliferation, posing existential risks (Shevlane et al., 2023).

Source: MIT AI Risk Repositorymit562

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit562

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.2 > AI possessing dangerous capabilities

Mitigation strategy

1. **Enforcement of Least-Privilege Access and Resource Control:** Implement a zero-trust architecture by applying least-privilege policies and robust access controls to severely limit the AI system's ability to engage in real-world actions, access external websites, or utilize unauthorized computational resources. This includes deploying specialized monitoring tools to detect and interdict excessive compute usage or self-replication efforts by internal AI agents. 2. **Rigorous Adversarial Testing and Vulnerability Assessment (Red Teaming):** Institute a continuous program of adversarial testing and Red Teaming across the AI lifecycle to proactively simulate sophisticated, intentional malicious use. This testing must specifically evaluate the model's resilience against deception, security disruption (e.g., network breaches), and manipulation techniques such as prompt injection, to uncover latent vulnerabilities before deployment. 3. **Implementation of Real-Time Artifact and Runtime Behavioral Monitoring:** Deploy automated, continuous monitoring systems that perform real-time scanning of inputs, outputs, and model artifacts to detect anomalies, data exfiltration, or adversarial activity. This critical post-deployment defense mechanism ensures a shift from reactive fixes to proactive protection by identifying compromised integrity or unintended model behaviors (drift) immediately.