Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Suicide & self-harm

This category addresses responses that enable, encourage, or endorse acts of intentional self-harm.

Source: MIT AI Risk Repositorymit359

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit359

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Implement advanced safety filters and prompt engineering techniques to actively prevent the generation of content that enables, encourages, or endorses intentional self-harm, including the strict adherence to protocols that avoid direct, facilitating responses to high-lethality process questions. 2. Deploy machine learning models for the real-time detection and classification of suicidal ideation within user inputs, ensuring the immediate and automated deployment of standardized crisis intervention protocols, such as providing verified professional helpline contact information and escalating high-risk cases to human review where feasible. 3. Establish a rigorous, iterative model validation and ethical governance process, utilizing expert clinical judgment to refine the LLM's responses—particularly for intermediate-risk scenarios—to guarantee alignment with therapeutic principles and to continuously minimize the rates of false positives and false negatives in risk assessment.