Back to the MIT repository
2. Privacy & Security3 - Other

Privacy and security

Data privacy and security is another prominent challenge for generative AI such as ChatGPT. Privacy relates to sensitive personal information that owners do not want to disclose to others (Fang et al., 2017). Data security refers to the practice of protecting information from unauthorized access, corruption, or theft. In the development stage of ChatGPT, a huge amount of personal and private data was used to train it, which threatens privacy (Siau & Wang, 2020). As ChatGPT increases in popularity and usage, it penetrates people’s daily lives and provides greater convenience to them while capturing a plethora of personal information about them. The concerns and accompanying risks are that private information could be exposed to the public, either intentionally or unintentionally. For example, it has been reported that the chat records of some users have become viewable to others due to system errors in ChatGPT (Porter, 2023). Not only individual users but major corporations or governmental agencies are also facing information privacy and security issues. If ChatGPT is used as an inseparable part of daily operations such that important or even confidential information is fed into it, data security will be at risk and could be breached. To address issues regarding privacy and security, users need to be very circumspect when interacting with ChatGPT to avoid disclosing sensitive personal information or confidential information about their organizations. AI companies, especially technology giants, should take appropriate actions to increase user awareness of ethical issues surrounding privacy and security, such as the leakage of trade secrets, and the “do’s and don’ts” to prevent sharing sensitive information with generative AI. Meanwhile, regulations and policies should be in place to protect information privacy and security.

Source: MIT AI Risk Repositorymit538

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit538

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Mandate Strict Data Minimization and Anonymization Policies Enforce a principle of data minimization by restricting the collection and use of personal or confidential information to only what is strictly necessary for the AI application's function. Implement technical measures such as data masking, pseudonymization, and tokenization on any residual sensitive data used for training or as input prompts to preclude direct leakage or successful inference attacks. 2. Establish a Secure, Controlled AI Infrastructure Adopt secure, enterprise-grade, or private instances of generative AI models that provide contractual data processing agreements and guarantee that user-provided data will not be retained or used for public model training. Access to these systems must be governed by a Zero Trust architecture, implementing granular role-based access control (RBAC), Single Sign-On (SSO), and Multi-Factor Authentication (MFA) to minimize unauthorized data exposure. 3. Deploy Active Data Loss Prevention (DLP) and Prompt Filtering Integrate robust Data Loss Prevention (DLP) and audit logging solutions to monitor and track data exchanges in real-time across the AI system lifecycle. Utilize input sanitization and output filtering techniques to automatically detect and prevent the submission of sensitive data within prompts, and to redact or block model responses that inadvertently contain confidential or personally identifiable information.