1. Discrimination & Toxicity1 - Pre-deployment

Risks from data (Risks of improper content and poisoning in training data)

If the training data includes illegal or harmful information, such as false, biased, or IPR-infringing content, or lacks diversity in its sources, the output may include harmful content like illegal, malicious, or extreme information. Training data is also at risk of being poisoned through tampering, error injection, or misleading actions by attackers. This can interfere with the model's probability distribution, reducing its accuracy and reliability.

Source: MIT AI Risk Repositorymit688

ENTITY

1 - Human

INTENT

3 - Other

TIMING

1 - Pre-deployment

Risk ID

mit688

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Establish rigorous **Data Validation and Provenance Tracking** Implement advanced data validation and sanitization techniques, such as statistical outlier detection and clustering algorithms, to detect and remove anomalous or suspicious data points prior to inclusion in the training corpus. Furthermore, maintain a comprehensive data provenance system to verify the authenticity, lineage, and integrity of all data sources, ensuring reliance only on vetted and trusted datasets. 2. Enforce **Principle of Least Privilege and Access Controls** Institute clear and robust access control policies, aligned with the Principle of Least Privilege (PoLP), to restrict modification privileges for training data and model configurations solely to authorized personnel. This measure is critical for mitigating the risk of data compromise through insider threats or exploited credentials. 3. Implement **Continuous Monitoring and Anomaly Detection** Deploy continuous, real-time monitoring of the AI/ML system's performance, inputs, and outputs. This includes leveraging user and entity behavior analytics (UEBA) and performance metrics to swiftly detect deviations, unexplained accuracy drops, or the emergence of unexpected biases, which serve as early indicators of a data poisoning event, and enable rapid response and model rollback.