2. Privacy & Security2 - Post-deployment

Revealing confidential information

When confidential information is used in training data, fine-tuning data, or as part of the prompt, models might reveal that data in the generated output. Revealing confidential information is a type of data leakage.

Source: MIT AI Risk Repositorymit1310

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1310

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Prioritize Data Minimization and Privacy-Enhancing Technologies Implement stringent data minimization techniques, ensuring only the absolute necessary and non-identifiable data is utilized for model training and prompting. This includes the application of data scrubbing, redaction, and pseudonymization methods, as well as differential privacy, to mathematically obscure individual data points and mitigate the risk of training data memorization and subsequent unintentional exposure in generated outputs. 2. Enforce Robust Access Controls and Encryption Protocols Establish and rigorously enforce Role-Based Access Control (RBAC) and Multi-Factor Authentication (MFA) across the entire AI ecosystem, including data storage, model APIs, and execution environments, adhering to the principle of least privilege. Furthermore, mandate industry-standard encryption for sensitive data at rest and in transit to render the information unreadable in the event of unauthorized access or interception. 3. Integrate Advanced Data Loss Prevention (DLP) and Continuous Monitoring Deploy specialized Data Loss Prevention (DLP) solutions, including those with AI prompt protection capabilities, to scan and prevent sensitive or confidential information from being entered into public-facing or unauthorized models in real-time. Concurrently, implement continuous monitoring and logging of all LLM interactions and outputs to proactively detect and alert on anomalous data patterns or the regurgitation of proprietary information.