Malicious and Indirect
Benign intermediate for harmful end objective
ENTITY
3 - Other
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit1442
Domain lineage
4. Malicious Actors & Misuse
4.0 > Malicious use
Mitigation strategy
1. Prioritize input filtering and detection through specialized classifiers: Implement proprietary machine-learning models (Prompt Injection Content Classifiers) to detect and filter malicious instructions embedded within external data formats at the initial ingestion stage, preventing harmful content from reaching the core LLM. 2. Augment model resilience with security thought reinforcement: Apply targeted security instructions surrounding the prompt content to steer the Large Language Model to ignore adversarial instructions and remain focused on the user-directed task, hardening the model's internal defense against accepted inputs. 3. Deploy a Human-in-the-Loop (HITL) safeguard: Establish a User Confirmation Framework that mandates explicit user approval for any sensitive or state-changing actions generated by the AI system, serving as a final, contextual safeguard against unauthorized execution resulting from a successful injection.