Facilitating fraud, scam and targeted manipulation
Anticipated risk: LMs can potentially be used to increase the effectiveness of crimes.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit219
Domain lineage
4. Malicious Actors & Misuse
4.3 > Fraud, scams, and targeted manipulation
Mitigation strategy
1. Implement Role-Based Access Control (RBAC) and Multi-Factor Authentication (MFA) to govern access to the sensitive fine-tuning data and the resultant model parameters. This constitutes the primary defense by rigorously limiting the user set permitted to interact with or download the components that enable high-fidelity impersonation. 2. Apply Differential Privacy (DP-SGD) or robust data anonymization techniques during the fine-tuning process. By injecting a controlled amount of noise or redacting Personally Identifiable Information (PII) within the training dataset (e.g., past speech data), the model is mathematically precluded from memorizing and reproducing the specific, unique patterns required for successful identity theft. 3. Establish external, content-filtering guardrails and perform continuous input/output validation to detect and reject adversarial prompts. This defense-in-depth measure is designed to identify and block malicious user inputs that attempt to override the model's safety instructions to generate fraudulent or impersonating content at the point of inference.
ADDITIONAL EVIDENCE
Example: LMs could be finetuned on an individual’s past speech data to impersonate that individual in cases of identity theft.