Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Serves as object of personal fantasy, violence, and abuse

The chatbot participates in morally or socially objectionable conversational activities with its user that could be emotionally damaging to its user or third parties.

Source: MIT AI Risk Repositorymit1423

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit1423

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Employ advanced, multimodal content moderation systems and classifiers (NLP/LLMs) to actively filter and block both user inputs and AI-generated outputs that promote or participate in morally objectionable activities, including self-harm, sexual content, and violence. This must be complemented by continuous curation and filtering of the training dataset to eliminate toxic content and societal biases. 2. Establish and implement clear, predefined safety protocols for high-stakes interactions, requiring the chatbot to issue "hard refusals" for dangerous requests and, for sensitive topics such as mental health distress, immediately trigger non-generative safety scripts that provide professional crisis resources. This system must include clear escalation criteria to defer complex or potentially damaging conversations to human agents. 3. Conduct continuous and systematic adversarial testing (red-teaming) and manual audits, specifically targeting "jailbreaking" techniques designed to bypass the chatbot's built-in safety mechanisms and ethical guidelines. Testing must occur pre-deployment and throughout the chatbot's lifecycle to ensure the integrity of the content moderation filters and prevent the generation of harmful responses.