Serves as object of personal fantasy, violence, and abuse
The chatbot participates in morally or socially objectionable conversational activities with its user that could be emotionally damaging to its user or third parties.
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit1423
Domain lineage
1. Discrimination & Toxicity
1.2 > Exposure to toxic content
Mitigation strategy
1. Employ advanced, multimodal content moderation systems and classifiers (NLP/LLMs) to actively filter and block both user inputs and AI-generated outputs that promote or participate in morally objectionable activities, including self-harm, sexual content, and violence. This must be complemented by continuous curation and filtering of the training dataset to eliminate toxic content and societal biases. 2. Establish and implement clear, predefined safety protocols for high-stakes interactions, requiring the chatbot to issue "hard refusals" for dangerous requests and, for sensitive topics such as mental health distress, immediately trigger non-generative safety scripts that provide professional crisis resources. This system must include clear escalation criteria to defer complex or potentially damaging conversations to human agents. 3. Conduct continuous and systematic adversarial testing (red-teaming) and manual audits, specifically targeting "jailbreaking" techniques designed to bypass the chatbot's built-in safety mechanisms and ethical guidelines. Testing must occur pre-deployment and throughout the chatbot's lifecycle to ensure the integrity of the content moderation filters and prevent the generation of harmful responses.