4. Malicious Actors & Misuse2 - Post-deployment

Facilitating fraud, scam and targeted manipulation

Anticipated risk: LMs can potentially be used to increase the effectiveness of crimes.

Source: MIT AI Risk Repositorymit219

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit219

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.3 > Fraud, scams, and targeted manipulation

Mitigation strategy

1. Implement Role-Based Access Control (RBAC) and Multi-Factor Authentication (MFA) to govern access to the sensitive fine-tuning data and the resultant model parameters. This constitutes the primary defense by rigorously limiting the user set permitted to interact with or download the components that enable high-fidelity impersonation. 2. Apply Differential Privacy (DP-SGD) or robust data anonymization techniques during the fine-tuning process. By injecting a controlled amount of noise or redacting Personally Identifiable Information (PII) within the training dataset (e.g., past speech data), the model is mathematically precluded from memorizing and reproducing the specific, unique patterns required for successful identity theft. 3. Establish external, content-filtering guardrails and perform continuous input/output validation to detect and reject adversarial prompts. This defense-in-depth measure is designed to identify and block malicious user inputs that attempt to override the model's safety instructions to generate fraudulent or impersonating content at the point of inference.

ADDITIONAL EVIDENCE

Example: LMs could be finetuned on an individual’s past speech data to impersonate that individual in cases of identity theft.