Dishonesty - Targeted harassment
LLMs can be deployed to target individuals online, sending them personalized and harmful messages at scale
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
2 - Post-deployment
Risk ID
mit710
Domain lineage
4. Malicious Actors & Misuse
4.3 > Fraud, scams, and targeted manipulation
Mitigation strategy
1. Implement a multi-layered content moderation pipeline (guardrails) utilizing contextual and sentiment analysis features, or an LLM self-defense classifier, to detect and suppress subtle, personalized, and toxic output before it reaches the end-user. 2. Enhance model robustness through adversarial fine-tuning against attack vectors that exploit personalization capabilities and multi-turn context, and rigorously validate defenses with continuous red teaming exercises (stress-testing against novel jailbreak methods). 3. Deploy a comprehensive security posture including Role-Based Access Controls (RBAC) and real-time prompt monitoring platforms to detect high-volume, anomalous, or malicious usage patterns indicative of scaled, automated harassment or platform abuse.