Non-decomissionability of models with open weights
If the model parameter weights are released or leaked in a security breach, the model cannot be decommissioned because the developer no longer has control over the publicly available model or its use. This prevents effective management and control of an open-sourced or leaked model. Models with publicly available weights are also easier to reconfigure, enabling misuse [178].
ENTITY
1 - Human
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit1160
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1. Implement rigorous **Training Data Curation** and pre-release **Full-Access Audits** to filter content related to high-risk capabilities (e.g., cyber-offense) and accurately measure potential misuse risk, informing a **tiered release decision** based on capability level. 2. Develop and integrate **Tamper-Resistant Fine-Tuning** techniques to actively suppress harmful inherent model capabilities and ensure that model-based safety alignment cannot be trivially sidestepped or circumvented by adversaries post-release. 3. Establish comprehensive **Model Provenance** mechanisms, such as **watermarking** the model weights, to enable the identification and tracing of derivatives throughout the open-weight ecosystem, thereby mitigating the difficulty in attributing malicious use.