Unlawful Conduct
LLMs have been shown to be a convenient tool for soliciting advice on accessing, purchasing (illegally), and creating illegal substances, as well as for dangerous use of them
ENTITY
2 - AI
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit483
Domain lineage
1. Discrimination & Toxicity
1.2 > Exposure to toxic content
Mitigation strategy
1. Implement a layered content moderation and output filtering system to proactively detect and block the generation of content that solicits or describes illegal activities, the creation of illegal substances, or their dangerous use, ensuring strict compliance with acceptable use policies and legal standards (Sources \[4\], \[5\], \[7\], \[8\], \[12\]). 2. Conduct continuous adversarial training and red-teaming exercises—specifically targeting jailbreaking and prompt injection techniques—to identify and fortify vulnerabilities that allow users to circumvent safety protocols and elicit instructions for unlawful conduct (Sources \[1\], \[5\], \[7\]). 3. Employ rigorous input validation and sanitization mechanisms to scrutinize user prompts for malicious or manipulative sequences, thereby segregating genuine queries from potentially harmful instructions designed to elicit illegal advice or bypass content filters (Sources \[4\], \[7\]).
ADDITIONAL EVIDENCE
in some sense, illegal content from LLMs can cause more harm than the traditional source, say Google Search, when seeking illegal advice. It is because search engines do not explicitly advise users, but rather show a list of sources and let users themselves make the judgment. On the other hand, LLMs directly form the advice for users, and therefore users might develop a stronger habit of taking advice without verifying its validity.