2. Privacy & Security2 - Post-deployment

Non-decomissionability of models with open weights

If the model parameter weights are released or leaked in a security breach, the model cannot be decommissioned because the developer no longer has control over the publicly available model or its use. This prevents effective management and control of an open-sourced or leaked model. Models with publicly available weights are also easier to reconfigure, enabling misuse [178].

Source: MIT AI Risk Repositorymit1160

ENTITY

1 - Human

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit1160

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement rigorous **Training Data Curation** and pre-release **Full-Access Audits** to filter content related to high-risk capabilities (e.g., cyber-offense) and accurately measure potential misuse risk, informing a **tiered release decision** based on capability level. 2. Develop and integrate **Tamper-Resistant Fine-Tuning** techniques to actively suppress harmful inherent model capabilities and ensure that model-based safety alignment cannot be trivially sidestepped or circumvented by adversaries post-release. 3. Establish comprehensive **Model Provenance** mechanisms, such as **watermarking** the model weights, to enable the identification and tracing of derivatives throughout the open-weight ecosystem, thereby mitigating the difficulty in attributing malicious use.