2. Privacy & Security2 - Post-deployment

Dissemination of dangerous information

Leaking, generating or correctly inferring hazardous or sensitive information that could pose a security threat

Source: MIT AI Risk Repositorymit268

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit268

Domain lineage

2. Privacy & Security

186 mapped risks

2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Mitigation strategy

1. Implement Advanced Content Filtering and Guardrails Deploy stringent, real-time safety classifiers and behavioral monitoring at the inference endpoint to detect, filter, and prevent the generation of content that describes hazardous procedures, sensitive intellectual property, or classified information. This includes using specialized safety models to assess the risk of the generated output before it is presented to the user. 2. Enforce Granular Data Governance and Access Controls Apply rigorous data classification, anonymization/pseudonymization techniques, and role-based access controls (RBAC) across the entire AI lifecycle. Ensure sensitive information is segregated and protected during data ingestion, model training, and retrieval-augmented generation (RAG) processes to prevent unauthorized leakage or inference. 3. Mandate Proactive AI Red Teaming and Adversarial Testing Systematically conduct offensive security assessments, including sophisticated prompt injection and model extraction attempts, to uncover vulnerabilities that could be exploited to manipulate the AI into revealing or generating prohibited hazardous information, thereby ensuring continuous robustness against adversarial threats.

ADDITIONAL EVIDENCE

Example: Generating information on how to create a novel biohazard (OpenAI, 2023a)±