7. AI System Safety, Failures, & Limitations3 - Other

Dataset shift

The term dataset shift was first used by Quiñonero-Candela et al. [35] to characterize the situation where the training data and the testing data (or data in runtime) of an AI/ML model demonstrate different distributions [36].

Source: MIT AI Risk Repositorymit334

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit334

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement continuous monitoring of both model performance (e.g., accuracy, discrimination, and calibration metrics) and data stream statistical properties (e.g., $P(X)$ distribution comparison) to promptly detect the onset and specific type of dataset shift (e.g., covariate shift or concept drift). 2. Employ Importance Reweighting techniques, such as Inverse Probability Weighting, to statistically correct for detected covariate shifts by reweighting instances in the training or validation set, thereby minimizing distribution mismatch without requiring full model retraining. 3. Establish a formal protocol for Model Retraining and Updating, which includes either scheduled refitting on the most recent, representative data or the use of online/adaptive learning methods to maintain model validity in non-stationary environments and mitigate concept drift.