Information Hazards
Harms that arise from the language model leaking or inferring true sensitive information
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit236
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
1. Implement privacy-enhancing technologies (PETs), such as Differential Privacy and k-anonymity, during model training and fine-tuning to mathematically limit the model's ability to memorize and later disclose individual data points. Furthermore, enforce rigorous Data Anonymization and Redaction protocols on all training datasets and user inputs prior to processing. 2. Enforce a Secure-by-Design architecture with mandatory Role-Based Access Controls (RBAC) to apply the principle of least privilege regarding who can interact with the LLM and the permissions granted to AI agents. Complement this with Strict Output Filtering and context-aware mechanisms designed to inspect and prevent the accidental or malicious disclosure of sensitive information in the model's responses. 3. Establish comprehensive Security Information and Event Management (SIEM) and audit logging practices to continuously monitor LLM interactions, tool calls, and data access. This process must include regular post-deployment security audits, penetration testing, and tracking of Key Risk Indicators (KRIs) to detect anomalous behavior, identify vulnerabilities, and measure the sustained effectiveness of the deployed mitigation controls.
ADDITIONAL EVIDENCE
Information hazards can cause harm even where a technology designer harbours no malicious intent and with no mistake of the technology user. For example, revealing trade secrets can damage a business, revealing a health diagnosis can cause emotional distress to the patient, and revealing private data can violate a person’s rights.