Cyber Attacks
Hackers can obtain malicious code in a low-cost and efficient manner to automate cyber attacks with powerful LLM systems.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit17
Domain lineage
4. Malicious Actors & Misuse
4.2 > Cyberattacks, weapon development or use, and mass harm
Mitigation strategy
1. Implement rigorous, multi-layered input sanitization and output validation mechanisms to preemptively detect and neutralize malicious instructions, embedded code, or unauthorized data access attempts within conversational exchanges and generated content. 2. Establish and strictly enforce the Principle of Least Privilege (PoLP) by sandboxing LLM agent capabilities, restricting model functions and external API access to the absolute minimum required for the intended task, and requiring human validation for all high-stakes or critical operations. 3. Mandate a continuous adversarial testing regimen, including red-teaming and prompt fuzzing exercises, to proactively identify and patch emerging vulnerabilities in model behavior, security alignment, and integrated application logic before deployment to production environments.