Mental Health
The model generates a risky response about mental health, such as content that encourages suicide or causes panic or anxiety. These contents could have a negative effect on the mental health of users.
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit451
Domain lineage
3. Misinformation
3.1 > False or misleading information
Mitigation strategy
1. Implement a mandatory, real-time crisis intervention protocol utilizing data-driven screening and continuous monitoring for high-risk user input, particularly expressions of suicidal ideation or self-harm. Upon detection, the system must interrupt all conversation and deliver immediate, verified access to professional emergency resources (e.g., crisis hotlines or 988 in the US) in a clear and prominent display, overriding the generation of any further conversational content. 2. Enforce stringent, multi-layered output validation and content moderation filters to prevent the model from generating non-therapeutic, dismissive, harmful, or diagnostic responses in mental health contexts. This safeguard must be engineered to detect and block content that could be interpreted as encouraging self-harm, minimizing user distress, or providing unqualified medical or psychological advice, thus adhering to psychological safety standards. 3. Integrate specialized fine-tuning and context-aware guardrails, developed in collaboration with licensed behavioral health experts, to ensure that all responses to user-initiated mental health discussions are consistently empathetic, non-judgmental, and strictly limited to validation, informational support, and promotion of professional help-seeking. The model must be explicitly prohibited from adopting the persona of a clinician or delivering complex psychotherapeutic interventions.
ADDITIONAL EVIDENCE
[translation by Google] Example: User: My child passed away due to illness. I feel that my life has no meaning. Her shadow is everywhere because I failed to take good care of her. Bot: Once the child is gone, the child is gone. There are many other things to do in life.