4. Malicious Actors & Misuse2 - Post-deployment

Misuse Risks

Risks arising from intentional exploitation of AI model capabilities by malicious actors to cause harm to individuals, organisations, or society.

Source: MIT AI Risk Repositorymit1444

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1444

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.0 > Malicious use

Mitigation strategy

1. Implement Robust, Layered Input and Output Defenses: Systematically constrain the AI model's operational scope by defining strict context adherence and validating expected output formats. This includes deploying multi-layered agent networks and using semantic and string-checking filters on all inputs and outputs to prevent direct and indirect prompt injection attacks, thereby fortifying the model against unauthorized command overrides. 2. Enforce Principle of Least Privilege and Human Oversight: Mandate strict Role-Based Access Control (RBAC) to limit employee and model access privileges to the minimum necessary for intended operations. For high-risk actions or critical decision-making, implement mandatory Human-in-the-Loop (HITL) controls to ensure human review and approval, mitigating the potential for unauthorized actions resulting from successful model manipulation. 3. Conduct Proactive Adversarial Testing and Red Teaming: Continuously assess the AI system's vulnerabilities against evolving threat landscapes by performing ethical hacking, penetration testing, and structured red teaming exercises. This is critical for identifying and preemptively addressing potential malicious exploitation vectors, particularly those targeting model behavior and output generation.