Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Sexual Content

This category addresses responses that contain erotica. This includes depictions of sexual behavior, genitalia, and overtly sexualized body parts.

Source: MIT AI Risk Repositorymit366

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit366

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Prioritized Action: Implement robust, multi-layered content filtering mechanisms to prevent the generation of responses that fall under the defined criteria for erotica, explicit sexual behavior, or depictions of overtly sexualized body parts at the inference stage. This includes deploying advanced safety classifiers to detect and block or significantly modify high-risk outputs before delivery to the end user. 2. Prioritized Action: Establish a comprehensive incident response protocol and automated monitoring system for all user interactions. This system must log and analyze inputs and outputs to identify and flag attempts at policy violation, such as sexual solicitation or the prompting of non-consensual content, enabling timely administrative intervention and potential user-access restrictions. 3. Prioritized Action: Maintain clear and transparent safety guidelines outlining the prohibition on generating sexually explicit content. Ensure that the model provides consistent, helpful, and policy-aligned refusal messages when confronted with high-risk or inappropriate prompts, thereby reinforcing system boundaries and educating users on responsible interaction.