4. Malicious Actors & Misuse2 - Post-deployment

Cyberattack

ability of LLMs to write reasonably good-quality code with extremely low cost and incredible speed, such great assistance can equally facilitate malicious attacks. In particular, malicious hackers can leverage LLMs to assist with performing cyberattacks leveraged by the low cost of LLMs and help with automating the attacks.

Source: MIT AI Risk Repositorymit494

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit494

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.2 > Cyberattacks, weapon development or use, and mass harm

Mitigation strategy

1. Implement rigorous content filtering and output validation mechanisms to proactively detect and block the generation of malicious code, phishing content, or attack scripts. All model outputs must be considered untrusted data and subjected to strict sanitization protocols before being executed or presented to a user. 2. Conduct continuous adversarial red-teaming and jailbreak assessments against the LLM to proactively identify and eliminate vulnerabilities that could be exploited by malicious actors to bypass safety controls and automate cyberattacks. 3. Enforce strict access controls, including Multi-Factor Authentication (MFA) and Role-Based Access Control (RBAC), to limit the population with access to the model's capabilities, and deploy continuous behavioral pattern analysis and rate limiting to detect and thwart resource-intensive automated misuse.

ADDITIONAL EVIDENCE

attacks include malware [287, 288, 289], phishing attacks [290, 289], and data stealing [291].