2. Privacy & Security3 - Other

Private information leakage

First, because LLMs display immense modelling power, there is a risk that the model weights encode private information present in the training corpus. In particular, it is possible for LLMs to ‘memorise’ personally identifiable information (PII) such as names, addresses and telephone numbers, and subsequently leak such information through generated text outputs (Carlini et al., 2021). Private information leakage could occur accidentally or as the result of an attack in which a person employs adversarial prompting to extract private information from the model. In the context of pre-training data extracted from online public sources, the issue of LLMs potentially leaking training data underscores the challenge of the ‘privacy in public’ paradox for the ‘right to be let alone’ paradigm and highlights the relevance of the contextual integrity paradigm for LLMs. Training data leakage can also affect information collected for the purpose of model refinement (e.g. via fine-tuning on user feedback) at later stages in the development cycle. Note, however, that the extraction of publicly available data from LLMs does not render the data more sensitive per se, but rather the risks associated with such extraction attacks needs to be assessed in light of the intentions and culpability of the user extracting the data.

Source: MIT AI Risk Repositorymit415

ENTITY

3 - Other

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit415

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Implement rigorous data minimization and redaction protocols, utilizing techniques like Differential Privacy or k-anonymity, to sanitize Personally Identifiable Information (PII) and confidential data from all model training and fine-tuning datasets, as well as real-time user inputs. 2. Enforce strict, zero-trust-aligned access controls, such as Role-Based Access Control (RBAC) and the Principle of Least Privilege, across the LLM system components and associated data stores, including the application of encryption for data at rest and in transit. 3. Deploy advanced inference-time defense mechanisms, such as Privacy-Aware Decoding (PAD) or privacy neuron control interventions (e.g., MPNC), to dynamically detect and mitigate the unintentional or adversarial emission of sensitive information during the generation of model outputs.