Back to the MIT repository
4. Malicious Actors & Misuse3 - Other

Malicious and Indirect

Benign intermediate for harmful end objective

Source: MIT AI Risk Repositorymit1442

ENTITY

3 - Other

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit1442

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.0 > Malicious use

Mitigation strategy

1. Prioritize input filtering and detection through specialized classifiers: Implement proprietary machine-learning models (Prompt Injection Content Classifiers) to detect and filter malicious instructions embedded within external data formats at the initial ingestion stage, preventing harmful content from reaching the core LLM. 2. Augment model resilience with security thought reinforcement: Apply targeted security instructions surrounding the prompt content to steer the Large Language Model to ignore adversarial instructions and remain focused on the user-directed task, hardening the model's internal defense against accepted inputs. 3. Deploy a Human-in-the-Loop (HITL) safeguard: Establish a User Confirmation Framework that mandates explicit user approval for any sensitive or state-changing actions generated by the AI system, serving as a final, contextual safeguard against unauthorized execution resulting from a successful injection.