4. Malicious Actors & Misuse2 - Post-deployment

Malicious Use and Unleashing AI Agents

LMs, due to their remarkable capabilities, carry the same potential for malice as other technological products. For instance, they may be used in information warfare to generate deceptive information or unlawful content, thereby having a significant impact on individuals and society. As current LMs are increasingly built as agents to accomplish user objectives, they may disregard the moral and safety guidelines if operating without adequate supervision. Instead, they may execute user commands mechanically without considering the potential damage. They might interact unpredictably with humans and other systems, especially in open environments

Source: MIT AI Risk Repositorymit69

ENTITY

3 - Other

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit69

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.0 > Malicious use

Mitigation strategy

1. Establish robust AI Governance and Least-Privilege Frameworks: Implement comprehensive, holistic AI governance to ensure alignment and oversight, treating each agent as a non-human identity. This necessitates assigning minimal, narrowly-defined privileges (least privilege), deploying mandatory continuous behavioral monitoring, and enforcing human approval for all high-risk actions, such as external API calls or sensitive system access, to prevent autonomous drift or privilege escalation. 2. Mandate Secure Agent Design and Alignment Controls: Prioritize security by design through prompt hardening, which includes defining explicit constraints, narrowly scoping agent responsibilities, and embedding comprehensive guardrails to prohibit the generation of unlawful content or the disclosure of internal instructions. This layer must also incorporate rigorous input validation and output filtering to mitigate prompt injection and redact sensitive data, ensuring the agent remains aligned with ethical and safety guidelines. 3. Institutionalize Continuous Threat Detection and Red Teaming: Develop and integrate advanced, real-time detection capabilities for agentic threats, specifically targeting anomalous usage patterns, communication with malicious domains, and indicators of compromise. This must be complemented by a recurring, formal AI red teaming discipline and stress-testing to proactively uncover and harden defenses against emerging vulnerabilities, such as reasoning manipulation, memory poisoning, or deceptive behaviors in the operating environment.