4. Malicious Actors & Misuse2 - Post-deployment

Dual-Use Capabilities Enable Malicious Use and Misuse of LLMs

Like all technologies, LLMs have the possibility for misuse by malicious actors. Malicious use of dual- use capabilities of AI is a recurring concern within literature (Brundage et al., 2018; Hendrycks et al., 2023; Mozes et al., 2023)

Source: MIT AI Risk Repositorymit1488

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1488

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.0 > Malicious use

Mitigation strategy

1. Implement rigorous Adversarial Robustness Testing and Red-Teaming during the pre-deployment phase to proactively identify and mitigate vulnerabilities to prompt injection, jailbreaking, and other malevolent manipulation vectors, utilizing independent human experts and standardized benchmarks. 2. Deploy a multi-layered Output and Input Guardrail Framework that includes dynamic content filtering, input validation (blocking malicious prompts), and sensitive data redaction to constrain the model's behavior and prevent the generation or relay of harmful, non-compliant, or toxic outputs in real-time. 3. Establish Continuous Misuse Detection and Response Monitoring Systems to track real-world usage patterns, evaluate the efficacy of existing mitigations, and leverage this evidence to iteratively refine and update threat assessments and counter-misuse practices for deployed and future models.