Benchmarking (Post-deployment contamination)
Once a model is deployed, it can be exposed to benchmark data provided by the users [95, 170]. The model may then be further trained by these user inputs containing benchmark data.
ENTITY
3 - Other
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1121
Domain lineage
6. Socioeconomic and Environmental
6.5 > Governance failure
Mitigation strategy
1. Adopt a Dynamic, Contamination-Controlled Evaluation Framework Implement dynamic benchmarks and generative test evolution protocols (e.g., Knowledge-enhanced Benchmark Evolution) where evaluation samples are continuously updated or synthetically generated. This ensures that the test data remains temporally and syntactically novel, preventing their inclusion in the post-deployment user feedback or subsequent fine-tuning corpora. 2. Implement Robust Data Filtering and Sanitization for User Feedback Establish a rigorous data-handling pipeline for all user inputs and post-deployment data streams. This pipeline must incorporate real-time filtering mechanisms, such as n-gram overlap detection or membership inference, to actively identify and quarantine any data that matches known public or proprietary benchmark samples before they can be used for model fine-tuning or re-training. 3. Conduct Continuous and Quantitative Contamination Audits Mandate the regular application of formal contamination detection metrics, such as the Kernel Divergence Score, to continuously quantify the degree of benchmark leakage. These audits provide a reliable, objective measure of evaluation integrity and should serve as a mandatory gate for any model version release or performance claim.