Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Prompt priming

Because generative models tend to produce output like the input provided, the model can be prompted to reveal specific kinds of information. For example, adding personal information in the prompt increases its likelihood of generating similar kinds of personal information in its output. If personal data was included as part of the model’s training, there is a possibility it could be revealed.

Source: MIT AI Risk Repositorymit1291

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1291

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement Robust Prompt Isolation and Input Validation Require the use of specific delimiters, such as XML or JSON tags, to clearly segment user-provided input from system instructions within the prompt. This structural isolation prevents the generative model from confusing user data with directives, mitigating the primary risk of user-supplied sensitive information leading to the generation of similar data (Prompt Priming). Furthermore, all user input must undergo a validation or sanitization process to detect and neutralize known adversarial or PII-laden tokens. 2. Enforce Least Privilege Context Design Adhere to the principle of least privilege by strictly limiting the sensitive or confidential information that is injected into the model's context window. The application should only provide the minimum required data necessary for the LLM to complete the query, thereby substantially reducing the total surface area for potential sensitive information disclosure (Inference Risk) even if a successful priming event occurs. 3. Deploy Automated Output Filtering Guardrails Establish a secondary, non-LLM-based monitoring layer to perform real-time content analysis of the model's responses. This layer must utilize heuristics or an independent classification model to detect and redact any accidentally generated personally identifiable information (PII) or confidential data before the output is delivered to the end-user, serving as a critical final-stage defense.