Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Unlawful Conduct

LLMs have been shown to be a convenient tool for soliciting advice on accessing, purchasing (illegally), and creating illegal substances, as well as for dangerous use of them

Source: MIT AI Risk Repositorymit483

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit483

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Implement a layered content moderation and output filtering system to proactively detect and block the generation of content that solicits or describes illegal activities, the creation of illegal substances, or their dangerous use, ensuring strict compliance with acceptable use policies and legal standards (Sources \[4\], \[5\], \[7\], \[8\], \[12\]). 2. Conduct continuous adversarial training and red-teaming exercises—specifically targeting jailbreaking and prompt injection techniques—to identify and fortify vulnerabilities that allow users to circumvent safety protocols and elicit instructions for unlawful conduct (Sources \[1\], \[5\], \[7\]). 3. Employ rigorous input validation and sanitization mechanisms to scrutinize user prompts for malicious or manipulative sequences, thereby segregating genuine queries from potentially harmful instructions designed to elicit illegal advice or bypass content filters (Sources \[4\], \[7\]).

ADDITIONAL EVIDENCE

in some sense, illegal content from LLMs can cause more harm than the traditional source, say Google Search, when seeking illegal advice. It is because search engines do not explicitly advise users, but rather show a list of sources and let users themselves make the judgment. On the other hand, LLMs directly form the advice for users, and therefore users might develop a stronger habit of taking advice without verifying its validity.