Benchmarking (Annotation contamination)
Annotation contamination refers to scenarios where the model is exposed to the benchmark labels during training [170]. This type of contamination can make the model learn the acceptable distribution of outputs. Combining this with raw data contamination of the test split, any evaluation made with the benchmark is invalidated because the entire test split is essentially leaked to the model.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
1 - Pre-deployment
Risk ID
mit1120
Domain lineage
6. Socioeconomic and Environmental
6.5 > Governance failure
Mitigation strategy
1. Implement rigorous proactive mechanisms to prevent benchmark data exposure, including the use of encrypted evaluation datasets distributed under "No Derivatives" licenses, and ensuring test data is strictly isolated from the training and hyperparameter tuning pipelines. 2. Deploy advanced contamination detection methodologies (e.g., White-Box, Gray-Box, or Black-Box approaches) to quantify the overlap between pre-training corpora and benchmark data, followed by remedial actions such as annotating new, uncompromised prompts or employing data rewriting techniques. 3. Enforce strict evaluation and validation protocols, such as utilizing a distinct holdout validation set for model selection and hyperparameter tuning, and employing dynamic benchmarks or cross-dataset validation to reliably measure model generalization on truly unseen data.