Unleashing AI Agents
people could build AIs that pursue dangerous goals’
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
1 - Pre-deployment
Risk ID
mit342
Domain lineage
4. Malicious Actors & Misuse
4.2 > Cyberattacks, weapon development or use, and mass harm
Mitigation strategy
1. Establish comprehensive, holistic AI governance and control frameworks that enforce the principle of least privilege, treating each agent as a strictly monitored, non-human identity with narrowly defined and limited permissions. This must include implementing full lifecycle governance, cross-functional risk evaluation, and mandatory human approval for all high-risk actions (e.g., external API calls or sensitive data access). 2. Implement continuous, real-time behavioral monitoring and runtime oversight to restrict agent autonomy and immediately detect anomalous or malicious activity. Key mechanisms include establishing baseline behavior profiles for each agent, integrating agent telemetry with security information and event management (SIEM) platforms, and deploying agents within isolated, sandboxed environments with defined network and data access for rapid termination. 3. Enforce secure agent design through rigorous prompt hardening and planning validation frameworks to prevent intentional manipulation or goal diversion. This entails explicitly prohibiting agents from disclosing internal instructions or tool schemas and implementing boundary management to ensure the agent's objectives remain consistently aligned with its authorized purpose.
ADDITIONAL EVIDENCE
Malicious actors could intentionally create rogue AIs.