Back to the MIT repository
2. Privacy & Security3 - Other

Prompt leaking

A prompt leak attack attempts to extract a model's system prompt (also known as the system message).

Source: MIT AI Risk Repositorymit1289

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit1289

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Segregate Sensitive Assets from Prompt Context Prohibit the embedding of high-value confidential data, such as API keys, credentials, proprietary business logic, or detailed permission schemas, directly within the system prompt. Sensitive information must be externalized and managed by secure, independent services that the Large Language Model (LLM) agent accesses via a strictly monitored, least-privilege framework at runtime. 2. Enforce Security Controls Independently from the LLM Implement critical security controls, including privilege separation, authorization bounds checks, and content filtering, in a deterministic layer *outside* the LLM's context. System prompts should not be relied upon as the primary mechanism for enforcing security or compliance policy, as the model's non-deterministic nature renders these controls highly susceptible to adversarial prompt injection and subsequent bypass. 3. Employ Robust Input Validation and Prompt Segmentation Mandate stringent input validation and sanitization processes to detect and block known adversarial patterns and obfuscated injection attempts before they reach the LLM. Furthermore, utilize clear prompt segmentation techniques, such as distinct message roles (e.g., 'system' and 'user') and delimiters, to logically separate trusted system instructions from untrusted user input, thereby minimizing the model's interpretation of user data as executable commands.