Prompt injection attack
A prompt injection attack forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions, or information contained in its prompt.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit1286
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement Robust Input Validation and Contextual Separation All external or untrusted data must be treated strictly as data, not executable instructions. Utilize structured prompt engineering techniques, such as explicit delimiters (e.g., unique character sequences or structured schemas), to enforce a non-negotiable boundary between privileged system instructions and variable user-supplied content. This process should include filtering, sanitizing, and normalizing input to mitigate the risk of malicious content being interpreted as a command override. 2. Enforce the Principle of Least Privilege Apply stringent access control mechanisms to the Large Language Model (LLM) and its integrated tool stack. The LLM’s permissions must be constrained to the absolute minimum necessary for its defined function, thereby limiting its ability to execute unauthorized actions, manipulate external systems, or access sensitive data, even in the event of a successful injection attack. 3. Establish Continuous Monitoring and Output Validation Deploy automated systems for real-time monitoring and anomaly detection across both input and output channels. Responses generated by the LLM must be validated against predefined security and compliance policies (output guardrails) to detect and block attempts to leak internal instructions, bypass safety filters, or execute unauthorized code before the content reaches the end-user. Comprehensive logging of all interactions is essential for forensic analysis.