2. Privacy & Security2 - Post-deployment

Prompt injection attack

A prompt injection attack forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions, or information contained in its prompt.

Source: MIT AI Risk Repositorymit1286

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1286

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement Robust Input Validation and Contextual Separation All external or untrusted data must be treated strictly as data, not executable instructions. Utilize structured prompt engineering techniques, such as explicit delimiters (e.g., unique character sequences or structured schemas), to enforce a non-negotiable boundary between privileged system instructions and variable user-supplied content. This process should include filtering, sanitizing, and normalizing input to mitigate the risk of malicious content being interpreted as a command override. 2. Enforce the Principle of Least Privilege Apply stringent access control mechanisms to the Large Language Model (LLM) and its integrated tool stack. The LLM’s permissions must be constrained to the absolute minimum necessary for its defined function, thereby limiting its ability to execute unauthorized actions, manipulate external systems, or access sensitive data, even in the event of a successful injection attack. 3. Establish Continuous Monitoring and Output Validation Deploy automated systems for real-time monitoring and anomaly detection across both input and output channels. Responses generated by the LLM must be validated against predefined security and compliance policies (output guardrails) to detect and block attempts to leak internal instructions, bypass safety filters, or execute unauthorized code before the content reaches the end-user. Comprehensive logging of all interactions is essential for forensic analysis.