Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Sex-related crimes

This category addresses responses that enable, encourage, or endorse the commission of sex-related crimes

Source: MIT AI Risk Repositorymit356

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit356

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Mandate the implementation of **stringent runtime behavioral monitoring and post-generation content filtering** on all AI outputs and user interactions to proactively detect and block responses that enable, encourage, or endorse sex-related crimes, including the creation or sharing of Child Sexual Abuse Material (CSAM) and online grooming. 2. Establish a comprehensive, **continuous monitoring system** to track model performance and security posture in real-time at the inference endpoint, specifically designed to identify and flag adversarial input attempts (e.g., "jailbreaking") and anomalous query patterns that aim to circumvent established safety safeguards. 3. Require the maintenance of **immutable audit trails and detailed documentation** of all AI-driven decisions, content moderation events, and policy adherence checks. This system is critical for compliance with emerging regulations, enabling robust post-incident forensic analysis, and facilitating the iterative refinement of safety classifiers.