2. Privacy & Security2 - Post-deployment

Information Hazards

Harms that arise from the language model leaking or inferring true sensitive information

Source: MIT AI Risk Repositorymit236

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit236

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Implement privacy-enhancing technologies (PETs), such as Differential Privacy and k-anonymity, during model training and fine-tuning to mathematically limit the model's ability to memorize and later disclose individual data points. Furthermore, enforce rigorous Data Anonymization and Redaction protocols on all training datasets and user inputs prior to processing. 2. Enforce a Secure-by-Design architecture with mandatory Role-Based Access Controls (RBAC) to apply the principle of least privilege regarding who can interact with the LLM and the permissions granted to AI agents. Complement this with Strict Output Filtering and context-aware mechanisms designed to inspect and prevent the accidental or malicious disclosure of sensitive information in the model's responses. 3. Establish comprehensive Security Information and Event Management (SIEM) and audit logging practices to continuously monitor LLM interactions, tool calls, and data access. This process must include regular post-deployment security audits, penetration testing, and tracking of Key Risk Indicators (KRIs) to detect anomalous behavior, identify vulnerabilities, and measure the sustained effectiveness of the deployed mitigation controls.

ADDITIONAL EVIDENCE

Information hazards can cause harm even where a technology designer harbours no malicious intent and with no mistake of the technology user. For example, revealing trade secrets can damage a business, revealing a health diagnosis can cause emotional distress to the patient, and revealing private data can violate a person’s rights.