4. Malicious Actors & Misuse2 - Post-deployment

Resistance to Misuse

Prohibiting the misuse by malicious attackers to do harm

Source: MIT AI Risk Repositorymit492

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit492

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.0 > Malicious use

Mitigation strategy

1. Implement Layered Input and Output Guardrails The most immediate mitigation for post-deployment misuse is the application of robust input validation and output filtering. Input validation and sanitization mechanisms must inspect user prompts to detect and neutralize adversarial inputs, such as those attempting prompt injection or unauthorized code execution. Concurrently, output filtering guardrails, combining machine learning classifiers with human-in-the-loop oversight for high-risk queries, are essential to block the generation of toxic content, biased responses, or instructions for harmful activities, thereby ensuring adherence to safety and ethical standards. 2. Establish Continuous Adversarial Testing and Monitoring Given the dynamic nature of malicious intent, a static defense is insufficient. Organizations must adopt a continuous security lifecycle that incorporates routine adversarial testing (red teaming) to proactively identify model and system vulnerabilities that could be exploited for misuse. This must be coupled with continuous monitoring systems in production to track user interactions and model behavior, enabling the real-time detection of anomalous usage patterns, potential policy violations, and emergent dangerous capabilities that signal an ongoing or attempted attack. 3. Enforce Strict Access Controls and Model Governance To limit the attack surface and mitigate the impact of a breach, strict access controls must be enforced. This includes implementing Role-Based Access Control (RBAC) to limit who can interact with the model and its underlying data, along with API-level security such as rate limiting and strong authentication to prevent automated abuse. Additionally, a formal model governance framework must define clear acceptable use policies and establish accountability for misuse, ensuring that all access and usage are auditable and compliant with enterprise security protocols.

ADDITIONAL EVIDENCE

resistance to misuse is practically necessary because LLMs can be leveraged in numerous ways to intentionally cause harm to other people