4. Malicious Actors & Misuse2 - Post-deployment

Surveillance and Censorship

Content moderation has emerged as one of the key use-cases of LLMs (Weng et al., 2023), indicating the potential of LLMs for surveillance and censorship as well (Edwards, 2023). Surveillance and censorship are one of the primary tools employed by governments with dictatorial tendencies to suppress opposing political and social voices. These censorship measures, however, are often quite crude and can be escaped with little ingenuity...However, LLMs could enable significantly more sophisticated surveillance and censorship operations at scale (Feldstein, 2019). Multimodal-LLMs or LLMs combined with speech- to-text technologies could be used for surveilling and censoring other forms of communication as well, e.g. phone calls and video messages (Whittaker, 2019). This may collectively contribute towards the worsening of personal liberties and the heightening of state oppression across the world. Examples have been documented already, for instance in calling for violence and silencing of political dissidents (Aziz, 2020), and suppression of Palestinian social media accounts (Zahzah, 2021).

Source: MIT AI Risk Repositorymit1491

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1491

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.1 > Disinformation, surveillance, and influence at scale

Mitigation strategy

1. Re-evaluate LLM censorship and misuse as a security problem rather than purely a machine learning classification task, prioritizing the deployment of security-based controls such as granular access permissions and continuous user-activity monitoring to mitigate systemic risks of sophisticated surveillance. 2. Develop and implement advanced, session-level and cross-session monitoring systems capable of detecting compositional or temporally distributed malicious intent (e.g., "Mosaic Prompts") by tracking conversation context, aggregated user actions, and policy-evasive phrasing. 3. Conduct rigorous, multi-lingual auditing and adversarial testing of LLM safety alignment to detect and mitigate implicit biases or undesirable conformity (e.g., political/social views) resulting from exposure to censored or biased training corpora.