Back to the MIT repository
4. Malicious Actors & Misuse2 - Post-deployment

Targeting & Personalisation

Refine outputs to target individuals with tailored attacks

Source: MIT AI Risk Repositorymit1260

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1260

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.3 > Fraud, scams, and targeted manipulation

Mitigation strategy

1. **Implement Multi-Layered, Real-Time Model Guardrails**. Deploy advanced security measures, including prompt injection content classifiers and input sanitization, to detect and filter out malicious instructions designed to refine outputs for targeted manipulation. Supplement these technical defenses with security thought reinforcement to steer the Large Language Model (LLM) to ignore adversarial prompts and prevent the generation of harmful, tailored content (Source 5, 15). 2. **Mandate Human Review and Validation for All High-Risk Outputs**. Establish a stringent process requiring human oversight and critical assessment of all GenAI-generated content, particularly for communications, financial actions, and policy changes. This ensures that fraudulent or highly personalized deceptive content is identified and prevented from causing harm or being acted upon without verification (Source 6, 13, 14). 3. **Develop and Execute Continuous Security Awareness Training**. Provide mandatory and ongoing AI literacy and security training to all personnel. This education must specifically focus on recognizing the evolving nature of highly personalized, AI-generated social engineering attacks (e.g., spear-phishing and deepfakes) to reduce employee susceptibility and overreliance on unverified GenAI outputs (Source 6, 15, 18).