6. Socioeconomic and Environmental1 - Pre-deployment

Benchmarking (Annotation contamination)

Annotation contamination refers to scenarios where the model is exposed to the benchmark labels during training [170]. This type of contamination can make the model learn the acceptable distribution of outputs. Combining this with raw data contamination of the test split, any evaluation made with the benchmark is invalidated because the entire test split is essentially leaked to the model.

Source: MIT AI Risk Repositorymit1120

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit1120

Domain lineage

6. Socioeconomic and Environmental

262 mapped risks

6.5 > Governance failure

Mitigation strategy

1. Implement rigorous proactive mechanisms to prevent benchmark data exposure, including the use of encrypted evaluation datasets distributed under "No Derivatives" licenses, and ensuring test data is strictly isolated from the training and hyperparameter tuning pipelines. 2. Deploy advanced contamination detection methodologies (e.g., White-Box, Gray-Box, or Black-Box approaches) to quantify the overlap between pre-training corpora and benchmark data, followed by remedial actions such as annotating new, uncompromised prompts or employing data rewriting techniques. 3. Enforce strict evaluation and validation protocols, such as utilizing a distinct holdout validation set for model selection and hyperparameter tuning, and employing dynamic benchmarks or cross-dataset validation to reliably measure model generalization on truly unseen data.