Back to the MIT repository
7. AI System Safety, Failures, & Limitations2 - Post-deployment

Data drift

Data drift is a phenomenon in that distribution of operational input data departs from those used during training. This can cause a degradation in performance.

Source: MIT AI Risk Repositorymit1015

ENTITY

3 - Other

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit1015

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement continuous, automated monitoring systems to track key statistical properties and performance metrics (e.g., Population Stability Index, Kolmogorov-Smirnov test, and accuracy) of the input and output data against the training baseline. This ensures the early detection of distributional shifts and triggers timely alerts. 2. Establish a robust, scheduled, and event-triggered model retraining pipeline. Models should be periodically updated with new, high-quality data and automatically retrained when monitoring alerts indicate that drift has exceeded predefined tolerance thresholds. 3. Employ adaptive learning techniques, such as online or incremental learning, or utilize ensemble methods to enhance model robustness. These approaches allow the model to adjust continuously or leverage multiple models to mitigate the detrimental impact of ongoing or mixed types of data drift.