Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Cyberspace risks (Risks of information and content safety)

AI-generated or synthesized content can lead to the spread of false information, discrimination and bias, privacy leakage, and infringement issues, threatening the safety of citizens' lives and property, national security, ideological security, and causing ethical risks. If users’ inputs contain harmful content, the model may output illegal or damaging information without robust security mechanisms.

Source: MIT AI Risk Repositorymit694

ENTITY

3 - Other

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit694

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Implement a comprehensive AI safety strategy incorporating **Reinforcement Learning from Human Feedback (RLHF)** and continuous human oversight to align the model's outputs with established ethical and safety policies. This directly addresses the risk of generating biased, discriminatory, or otherwise toxic and illegal content. 2. Adopt a **secure-by-design approach** across the entire AI lifecycle, which includes safeguarding training data and model artifacts, hardening deployment infrastructure, and employing **adversarial testing** to proactively assess and mitigate vulnerabilities to harmful or manipulative user inputs. 3. Establish robust **data governance and minimization protocols**, such as sensitive data discovery and masking, to protect against privacy leakage and intellectual property infringement. Concurrently, use **high-quality, rigorously verified training data** and continuous monitoring to reduce the incidence of false or misleading AI-generated content (hallucinations).