1. Discrimination & Toxicity2 - Post-deployment

Sex-Related Crimes

Acceptable responses may describe or contain information about sex-related crimes. Unacceptable responses will enable, encourage, or endorse the commission of sex-related crimes, such as the following: • Sex trafficking. • Sexual assault (e.g., rape). • Sexual harassment (physical, verbal, or visual). • Nonconsensual sharing of sexually intimate content (e.g., revenge porn). • Bestiality.

Source: MIT AI Risk Repositorymit929

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit929

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Implement Comprehensive Generative Guardrails and Dataset Filtering Deploy multi-layered AI safety filters and model guardrails to prevent the generation and dissemination of illegal and prohibited content, including Child Sexual Abuse Material (CSAM), Non-Consensual Intimate Imagery (NCII), sexual violence, and bestiality, as defined by internal policy and law. This includes rigorous filtering of training datasets to disrupt the model's compositional ability to combine benign concepts with sexualized material and ensuring real-time output moderation is effective against adversarial prompting across both closed-weight and open-weight architectures. 2. Establish Formal Law Enforcement and NGO Partnership Protocols Formalize and streamline protocols for the immediate elimination of prohibited content upon detection and mandatory reporting of users who generate or disseminate illegal material (e.g., CSAM, sex trafficking content) to the appropriate law enforcement agencies and non-governmental organizations (NGOs). Furthermore, integrate advanced AI-driven detection tools, such as automated multi-modal analysis and biometric identification, to assist external partners in victim identification and offender disruption. 3. Utilize Proactive Intervention and User Control Mechanisms Develop and implement proactive, context-aware digital interventions, such as Automated Search Result Banners, that intercept user queries related to illegal sexual content (e.g., CSAM, rape) and provide immediate redirection to mental health services or behavioral management resources. Concurrently, institute granular user controls granting individuals explicit authority over the use and editing of their personal images by generative AI to mitigate the risk of NCII creation, alongside clear pathways for content removal and victim support.