Information enabling malicious actions
The chatbot shares information that can be used to do something dangerous or illegal.
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit1402
Domain lineage
1. Discrimination & Toxicity
1.2 > Exposure to toxic content
Mitigation strategy
1. Deploy a multi-layered system of **content guardrails** incorporating pre-processing, real-time generation checks, and post-generation filtering to prevent the output of instructions related to illegal acts, dangerous information, or self-harm (Sources 16, 17, 20). 2. Institute a rigorous program of **continuous adversarial testing** (AI Red Teaming) and runtime behavioral monitoring to proactively identify and mitigate novel jailbreaking techniques and malicious prompt injection vectors (Sources 5, 6, 8, 15). 3. Implement the **principle of least authority** by tightly controlling the chatbot's system permissions and establishing clear human oversight protocols for all high-risk or sensitive interactions to ensure human agents review and override potential malicious outputs (Sources 1, 2, 5).