Dataset shift
The term dataset shift was first used by Quiñonero-Candela et al. [35] to characterize the situation where the training data and the testing data (or data in runtime) of an AI/ML model demonstrate different distributions [36].
ENTITY
2 - AI
INTENT
2 - Unintentional
TIMING
3 - Other
Risk ID
mit334
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Implement continuous monitoring of both model performance (e.g., accuracy, discrimination, and calibration metrics) and data stream statistical properties (e.g., $P(X)$ distribution comparison) to promptly detect the onset and specific type of dataset shift (e.g., covariate shift or concept drift). 2. Employ Importance Reweighting techniques, such as Inverse Probability Weighting, to statistically correct for detected covariate shifts by reweighting instances in the training or validation set, thereby minimizing distribution mismatch without requiring full model retraining. 3. Establish a formal protocol for Model Retraining and Updating, which includes either scheduled refitting on the most recent, representative data or the use of online/adaptive learning methods to maintain model validity in non-stationary environments and mitigate concept drift.