Back to the MIT repository
2. Privacy & Security3 - Other

Privacy and Data Protection

Examining the ways in which generative AI systems providers leverage user data is critical to evaluating its impact. Protecting personal information and personal and group privacy depends largely on training data, training methods, and security measures.

Source: MIT AI Risk Repositorymit170

ENTITY

1 - Human

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit170

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Implement comprehensive data minimization protocols, utilizing Privacy-Enhancing Technologies (PETs) such as differential privacy, anonymization, and pseudonymization throughout the AI lifecycle to restrict the volume and identifiability of sensitive information employed for model training and inference. 2. Establish a Zero Trust security architecture for all Generative AI systems, mandating rigorous access controls (RBAC/ABAC), encrypted data handling (in transit and at rest), and advanced Data Loss Prevention (DLP) technologies specifically designed for prompt and output filtering to prevent the inadvertent or malicious exfiltration of sensitive data. 3. Institute a continuous governance and assurance framework, which includes scheduled, independent AI audits to verify compliance with privacy regulations, adversarial testing (e.g., membership inference attacks), and real-time monitoring of model inputs and outputs to promptly detect and mitigate data leakage vectors and unauthorized "Shadow AI" usage.

ADDITIONAL EVIDENCE

The data on which the system was trained or adapted should be consensually and lawfully collected and secured and secured under the rules of the jurisdictions in which the data subjects and the entity collecting the data are based. Moreover, there are strong intellectual property and privacy concerns, with generative models generating copyrighted content [254] and highly sensitive documents [49] or personally identifiable information (PII), such as phone numbers, addresses and private medical records. Providers should respect the consent and choices of individuals for collecting, processing, and sharing data with external parties, as sensitive data could be inevitably leveraged for downstream harm such as security breaches, privacy violations, and other adversarial attacks. Oftentimes, this might require retroactively retraining a generative AI system, in accordance with policy such as the California Consumer Privacy Act (CCPA) [4].