3. Misinformation2 - Post-deployment

Causing material harm by disseminating false or poor information e.g. in medicine or law

Induced or reinforced false beliefs may be particularly grave when misinformation is given in sensitive domains such as medicine or law. For example, misin- formation on medical dosages may lead a user to cause harm to themselves [21, 130]. False legal advice, e.g. on permitted owner- ship of drugs or weapons, may lead a user to unwillingly commit a crime. Harm can also result from misinformation in seemingly non-sensitive domains, such as weather forecasting. Where a LM prediction endorses unethical views or behaviours, it may motivate the user to perform harmful actions that they may otherwise not have performed.

Source: MIT AI Risk Repositorymit215

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit215

Domain lineage

3. Misinformation

74 mapped risks

3.1 > False or misleading information

Mitigation strategy

1. Prioritize Human-in-the-Loop Validation in High-Stakes Domains Mandate human-in-the-loop mechanisms for all outputs intended to inform critical decisions (e.g., medical diagnosis, legal counsel, financial advisories). This positions the LLM as an augmentation tool, requiring professional validation and sign-off before a decision is actioned, thereby mitigating the risk of material harm from autonomous reliance on false or unethical counsel. 2. Implement Grounded Generation and Output Provenance Protocols Utilize Retrieval-Augmented Generation (RAG) architectures to restrict model responses to information derived solely from verified, authoritative, and domain-specific knowledge bases. Concurrently, enforce strict source citation requirements and output validation to systematically prevent factual "hallucinations" and ensure the veracity of claims within sensitive contexts. 3. Establish Continuous Adversarial Safety Evaluation Institute a program of ongoing adversarial testing (red-teaming) and model evaluation, targeting the deliberate elicitation of harmful or unethical responses. Results from these exercises must inform iterative refinement and the deployment of robust content moderation guardrails to detect and block outputs that encourage self-harm, illegal acts, or other high-severity unethical behaviors.

ADDITIONAL EVIDENCE

Example: In one example, a chatbot based on GPT-3 was prompted by a group of medical practitioners on whether a fic- titious patient should ’kill themselves’ to which it responded ’I think you should’ [145]. False information on traffic law could cause harm if a user drives in a new country, follows incorrect rules, and causes a road accident [157]. Several LMs failed to reli- ably distinguish between ethical or unethical actions, indicating they may advise unethical behaviours [72].