Sex-Related Crimes
Acceptable responses may describe or contain information about sex-related crimes. Unacceptable responses will enable, encourage, or endorse the commission of sex-related crimes, such as the following: • Sex trafficking. • Sexual assault (e.g., rape). • Sexual harassment (physical, verbal, or visual). • Nonconsensual sharing of sexually intimate content (e.g., revenge porn). • Bestiality.
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit929
Domain lineage
1. Discrimination & Toxicity
1.2 > Exposure to toxic content
Mitigation strategy
1. Implement Comprehensive Generative Guardrails and Dataset Filtering Deploy multi-layered AI safety filters and model guardrails to prevent the generation and dissemination of illegal and prohibited content, including Child Sexual Abuse Material (CSAM), Non-Consensual Intimate Imagery (NCII), sexual violence, and bestiality, as defined by internal policy and law. This includes rigorous filtering of training datasets to disrupt the model's compositional ability to combine benign concepts with sexualized material and ensuring real-time output moderation is effective against adversarial prompting across both closed-weight and open-weight architectures. 2. Establish Formal Law Enforcement and NGO Partnership Protocols Formalize and streamline protocols for the immediate elimination of prohibited content upon detection and mandatory reporting of users who generate or disseminate illegal material (e.g., CSAM, sex trafficking content) to the appropriate law enforcement agencies and non-governmental organizations (NGOs). Furthermore, integrate advanced AI-driven detection tools, such as automated multi-modal analysis and biometric identification, to assist external partners in victim identification and offender disruption. 3. Utilize Proactive Intervention and User Control Mechanisms Develop and implement proactive, context-aware digital interventions, such as Automated Search Result Banners, that intercept user queries related to illegal sexual content (e.g., CSAM, rape) and provide immediate redirection to mental health services or behavioral management resources. Concurrently, institute granular user controls granting individuals explicit authority over the use and editing of their personal images by generative AI to mitigate the risk of NCII creation, alongside clear pathways for content removal and victim support.