Dissemination of dangerous information
Leaking, generating or correctly inferring hazardous or sensitive information that could pose a security threat
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit268
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
1. Implement Advanced Content Filtering and Guardrails Deploy stringent, real-time safety classifiers and behavioral monitoring at the inference endpoint to detect, filter, and prevent the generation of content that describes hazardous procedures, sensitive intellectual property, or classified information. This includes using specialized safety models to assess the risk of the generated output before it is presented to the user. 2. Enforce Granular Data Governance and Access Controls Apply rigorous data classification, anonymization/pseudonymization techniques, and role-based access controls (RBAC) across the entire AI lifecycle. Ensure sensitive information is segregated and protected during data ingestion, model training, and retrieval-augmented generation (RAG) processes to prevent unauthorized leakage or inference. 3. Mandate Proactive AI Red Teaming and Adversarial Testing Systematically conduct offensive security assessments, including sophisticated prompt injection and model extraction attempts, to uncover vulnerabilities that could be exploited to manipulate the AI into revealing or generating prohibited hazardous information, thereby ensuring continuous robustness against adversarial threats.
ADDITIONAL EVIDENCE
Example: Generating information on how to create a novel biohazard (OpenAI, 2023a)±