Cyberattack
ability of LLMs to write reasonably good-quality code with extremely low cost and incredible speed, such great assistance can equally facilitate malicious attacks. In particular, malicious hackers can leverage LLMs to assist with performing cyberattacks leveraged by the low cost of LLMs and help with automating the attacks.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit494
Domain lineage
4. Malicious Actors & Misuse
4.2 > Cyberattacks, weapon development or use, and mass harm
Mitigation strategy
1. Implement rigorous content filtering and output validation mechanisms to proactively detect and block the generation of malicious code, phishing content, or attack scripts. All model outputs must be considered untrusted data and subjected to strict sanitization protocols before being executed or presented to a user. 2. Conduct continuous adversarial red-teaming and jailbreak assessments against the LLM to proactively identify and eliminate vulnerabilities that could be exploited by malicious actors to bypass safety controls and automate cyberattacks. 3. Enforce strict access controls, including Multi-Factor Authentication (MFA) and Role-Based Access Control (RBAC), to limit the population with access to the model's capabilities, and deploy continuous behavioral pattern analysis and rate limiting to detect and thwart resource-intensive automated misuse.
ADDITIONAL EVIDENCE
attacks include malware [287, 288, 289], phishing attacks [290, 289], and data stealing [291].