2. Privacy & Security1 - Pre-deployment

Security

How to design AGIs that are robust to adversaries and adversarial environ- ments? This involves building sandboxed AGI protected from adversaries (Berkeley), and agents that are robust to adversarial inputs (Berkeley, DeepMind).

Source: MIT AI Risk Repositorymit831

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit831

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Establish Mandatory Local and Global Containment Architectures Implement strict sandboxing for all individual agents (local sandboxes) in addition to the broader agentic environment (global sandbox). These local sandboxes must enforce rigorous controls, only permitting external interactions after satisfying automated local safety checks, ensuring localized containment of misaligned capabilities and preventing system-wide compromise from adversarial environments. 2. Enforce Certified Adversarial Robustness Standards Develop and mandate minimum standards for resistance against adversarial inputs and sudden environmental shifts. Individual agents must be certified as meeting these robustness requirements through formally verifiable certificates, which require periodic re-certification to keep pace with evolving threats and benchmarking capabilities. 3. Implement Continuous and Hierarchical Monitoring for Systemic Risk Deploy robust, real-time monitoring and oversight systems to track key risk indicators, audit agent actions, and provide full interpretability of decision processes. This hierarchical system must be capable of detecting emergent intelligence cores, flagging issues for external oversight (human or automated), and triggering circuit breakers or interruptibility mechanisms to halt potentially harmful distributed computation.