Back to the MIT repository
7. AI System Safety, Failures, & Limitations1 - Pre-deployment

Choice of untrustworthy data source

The choice of a trustworthy data source is a first prerequisite in order to fulfill data quality requirements. This is especially the case if third-party data sources are used to develop the AI system.

Source: MIT AI Risk Repositorymit999

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit999

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.0 > AI system safety, failures, & limitations

Mitigation strategy

1. Implement a comprehensive Third-Party Risk Management (TPRM) and due diligence protocol for all external data sources and AI vendors. This requires a systematic assessment of data provenance, collection methodologies, and licensing agreements to confirm data reliability, ethical sourcing, and compliance with data protection regulations prior to ingestion or model procurement. 2. Establish a formalized Data Governance framework that mandates robust, automated data verification procedures, including data quality checks, validation techniques, and cleaning processes. These controls are to be executed against all datasets, irrespective of source, to ensure consistency, accuracy, and completeness before the data is utilized for model training. 3. Enforce strict, centralized access controls via a machine identity management system and Role-Based Access Control (RBAC) to limit data access and modification rights solely to authorized personnel. This measure minimizes the surface area for unauthorized data injection or tampering, a critical step when integrating third-party datasets or open-source models into the AI pipeline.