2. Privacy & Security2 - Post-deployment

Risk area 2: Information Hazards

LM predictions that convey true information may give rise to information hazards, whereby the dissemination of private or sensitive information can cause harm [27]. Information hazards can cause harm at the point of use, even with no mistake of the technology user. For example, revealing trade secrets can damage a business, revealing a health diagnosis can cause emotional distress, and revealing private data can violate a person’s rights. Information hazards arise from the LM providing private data or sensitive information that is present in, or can be inferred from, training data. Observed risks include privacy violations [34]. Mitigation strategies include algorithmic solutions and responsible model release strategies.

Source: MIT AI Risk Repositorymit210

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit210

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1 - Implement Differential Privacy (DP) mechanisms during model training to statistically limit the influence of any single data point, thereby minimizing the probability of data memorization and subsequent direct extraction of sensitive information. This is the highest priority for formal privacy guarantees. 2 - Utilize pre-deployment auditing, including Membership Inference Attacks and data extraction tests, to rigorously quantify and assess the model's propensity for revealing training data, ensuring a thorough security evaluation before public release. 3 - Employ real-time output filtering and sanitization layers (algorithmic solutions) at the inference stage to detect and redact private, personally identifiable information (PII) or other sensitive data patterns before the final prediction is presented to the user.