4. Malicious Actors & Misuse1 - Pre-deployment

Unleashing AI Agents

people could build AIs that pursue dangerous goals’

Source: MIT AI Risk Repositorymit342

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

1 - Pre-deployment

Risk ID

mit342

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Establish comprehensive, holistic AI governance and control frameworks that enforce the principle of least privilege, treating each agent as a strictly monitored, non-human identity with narrowly defined and limited permissions. This must include implementing full lifecycle governance, cross-functional risk evaluation, and mandatory human approval for all high-risk actions (e.g., external API calls or sensitive data access). 2. Implement continuous, real-time behavioral monitoring and runtime oversight to restrict agent autonomy and immediately detect anomalous or malicious activity. Key mechanisms include establishing baseline behavior profiles for each agent, integrating agent telemetry with security information and event management (SIEM) platforms, and deploying agents within isolated, sandboxed environments with defined network and data access for rapid termination. 3. Enforce secure agent design through rigorous prompt hardening and planning validation frameworks to prevent intentional manipulation or goal diversion. This entails explicitly prohibiting agents from disclosing internal instructions or tool schemas and implementing boundary management to ensure the agent's objectives remain consistently aligned with its authorized purpose.

ADDITIONAL EVIDENCE

Malicious actors could intentionally create rogue AIs.