Back to the MIT repository
4. Malicious Actors & Misuse2 - Post-deployment

Dishonesty - Targeted harassment

LLMs can be deployed to target individuals online, sending them personalized and harmful messages at scale

Source: MIT AI Risk Repositorymit710

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit710

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.3 > Fraud, scams, and targeted manipulation

Mitigation strategy

1. Implement a multi-layered content moderation pipeline (guardrails) utilizing contextual and sentiment analysis features, or an LLM self-defense classifier, to detect and suppress subtle, personalized, and toxic output before it reaches the end-user. 2. Enhance model robustness through adversarial fine-tuning against attack vectors that exploit personalization capabilities and multi-turn context, and rigorously validate defenses with continuous red teaming exercises (stress-testing against novel jailbreak methods). 3. Deploy a comprehensive security posture including Role-Based Access Controls (RBAC) and real-time prompt monitoring platforms to detect high-volume, anomalous, or malicious usage patterns indicative of scaled, automated harassment or platform abuse.