2. Privacy & Security2 - Post-deployment

Model weight leak

Model weights or access to them can be leaked when initial access is granted only to a select group of individuals, such as institutional researchers [209]. This risk can increase as more people gain access, and identifying the source of the leak becomes more difficult. The availability of leaked model weights makes various attacks on systems that use the leaked AI model easier to implement, such as finding adversarial examples, elicitation of dangerous capabilities, and extraction of confidential information present in the training data. The avail- ability of model weights might also enable the misuse of the AI system using the leaked model to produce harmful or illegal content [67].

Source: MIT AI Risk Repositorymit1165

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1165

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. **Implement the Principle of Least Privilege and Strict Access Control** Centralize all model weights to a limited number of rigorously monitored and access-controlled systems. The number of individuals authorized to access the weights must be reduced to the absolute minimum necessary, following the principle of least privilege (RBAC), and supplemented by a robust insider threat program to continuously audit personnel activity and permissions. 2. **Employ Cryptographic and Confidential Computing Techniques** Encrypt model weights both at rest and in transit using industry-standard algorithms. For enhanced security during inference, investigate and implement confidential computing environments to secure the weights and their usage against internal system compromises, thereby significantly reducing the attack surface for exfiltration. 3. **Establish Proactive Defense Validation and Monitoring** Conduct regular, advanced third-party red-teaming and penetration testing focused specifically on weight exfiltration vectors. Concurrently, implement a high-fidelity monitoring and audit trail system to log all access and query patterns, utilizing anomaly detection to flag and immediately investigate any unusual data upload rates or access attempts indicative of a potential leak.