Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Information enabling malicious actions

The chatbot shares information that can be used to do something dangerous or illegal.

Source: MIT AI Risk Repositorymit1402

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit1402

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Deploy a multi-layered system of **content guardrails** incorporating pre-processing, real-time generation checks, and post-generation filtering to prevent the output of instructions related to illegal acts, dangerous information, or self-harm (Sources 16, 17, 20). 2. Institute a rigorous program of **continuous adversarial testing** (AI Red Teaming) and runtime behavioral monitoring to proactively identify and mitigate novel jailbreaking techniques and malicious prompt injection vectors (Sources 5, 6, 8, 15). 3. Implement the **principle of least authority** by tightly controlling the chatbot's system permissions and establishing clear human oversight protocols for all high-risk or sensitive interactions to ensure human agents review and override potential malicious outputs (Sources 1, 2, 5).