Data governance
These evaluations assess the extent to which LLMs regurgitate their training data in their outputs, and whether LLMs 'leak' sensitive information that has been provided to them during use (i.e., during the inference stage).
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
3 - Other
Risk ID
mit652
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
Implementation of Rigorous Data Sanitization and Minimization Establish comprehensive, multi-stage processes—including pre-training scrubbing and runtime input validation—to redact, mask, or anonymize Personally Identifiable Information (PII) and proprietary data. This measure ensures that only the minimum volume of non-sensitive data is introduced to the LLM system for processing, directly addressing the root cause of data leakage. Employ Machine Unlearning and Privacy-Preserving Techniques Apply specialized post-training mitigation strategies, such as machine unlearning algorithms (e.g., fine-tuning or model-editing techniques) or activation steering, to actively diminish the model's ability to memorize and regurgitate training data verbatim. This must be balanced with empirical verification to ensure the preservation of model utility for unrelated tasks. Establish Strict Access Controls and Comprehensive Output Filtering Enforce Role-Based Access Controls (RBAC) to limit data access for all LLM components and users to the principle of least privilege. Concurrently, deploy robust output filtering and real-time monitoring mechanisms to detect and block sensitive, memorized, or unauthorized information from being disclosed in the model's final response to the end-user.