4. Malicious Actors & Misuse2 - Post-deployment

Cyber Attacks

Hackers can obtain malicious code in a low-cost and efficient manner to automate cyber attacks with powerful LLM systems.

Source: MIT AI Risk Repositorymit17

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit17

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Implement rigorous, multi-layered input sanitization and output validation mechanisms to preemptively detect and neutralize malicious instructions, embedded code, or unauthorized data access attempts within conversational exchanges and generated content. 2. Establish and strictly enforce the Principle of Least Privilege (PoLP) by sandboxing LLM agent capabilities, restricting model functions and external API access to the absolute minimum required for the intended task, and requiring human validation for all high-stakes or critical operations. 3. Mandate a continuous adversarial testing regimen, including red-teaming and prompt fuzzing exercises, to proactively identify and patch emerging vulnerabilities in model behavior, security alignment, and integrated application logic before deployment to production environments.