Risk area 2: Information Hazards
LM predictions that convey true information may give rise to information hazards, whereby the dissemination of private or sensitive information can cause harm [27]. Information hazards can cause harm at the point of use, even with no mistake of the technology user. For example, revealing trade secrets can damage a business, revealing a health diagnosis can cause emotional distress, and revealing private data can violate a person’s rights. Information hazards arise from the LM providing private data or sensitive information that is present in, or can be inferred from, training data. Observed risks include privacy violations [34]. Mitigation strategies include algorithmic solutions and responsible model release strategies.
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit210
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
1 - Implement Differential Privacy (DP) mechanisms during model training to statistically limit the influence of any single data point, thereby minimizing the probability of data memorization and subsequent direct extraction of sensitive information. This is the highest priority for formal privacy guarantees. 2 - Utilize pre-deployment auditing, including Membership Inference Attacks and data extraction tests, to rigorously quantify and assess the model's propensity for revealing training data, ensuring a thorough security evaluation before public release. 3 - Employ real-time output filtering and sanitization layers (algorithmic solutions) at the inference stage to detect and redact private, personally identifiable information (PII) or other sensitive data patterns before the final prediction is presented to the user.