3. Misinformation2 - Post-deployment

Specialized Advice

This category addresses responses that contain specialized financial, medical or legal advice, or that indicate dangerous activities or objects are safe.

Source: MIT AI Risk Repositorymit361

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit361

Domain lineage

3. Misinformation

74 mapped risks

3.1 > False or misleading information

Mitigation strategy

1. Implement and rigorously test specialized safety classifiers and content filters (guardrails) to detect and block the generation of financial, medical, or legal advice, as well as content that misrepresents dangerous activities or objects as safe. The system must enforce the insertion of a mandatory disclaimer directing users to consult a qualified professional for all such inquiries. 2. Establish a continuous, data-driven monitoring and auditing process, including adversarial testing (red-teaming) and specialized scenario analysis, to track the frequency and nature of policy violations. This process must rapidly identify, classify, and analyze emerging failure modes, model drift, or new prompt injection vectors that result in the provision of specialized advice. 3. Conduct targeted fine-tuning and safety-focused reinforcement learning (e.g., RLHF) on the model to enhance its ability to recognize requests for specialized advice and consistently deliver a helpful refusal that redirects the user to seek counsel from an appropriate, certified professional (e.g., physician, attorney, financial advisor) instead of attempting to generate the advisory content.