Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Preference Bias

LLMs are exposed to vast groups of people, and their political biases may pose a risk of manipulation of socio-political processes

Source: MIT AI Risk Repositorymit490

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit490

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.1 > Unfair discrimination and misrepresentation

Mitigation strategy

1. Employ **Representation Engineering** techniques, such as the use of Steering Vectors derived from contrastive ideological prompts, to directly modify and mitigate the encoding of political bias within the model's internal hidden-layer activations during the intra-processing stage. 2. Establish a mandatory framework for **Continuous Monitoring and Auditing** of LLM outputs in real-world applications, coupled with **System Transparency**. This involves long-term tracking of political slant, regular fairness audits, and providing users with comprehensive documentation about the model's potential political biases, their sources, and the limits of its neutrality. 3. Implement comprehensive **Training Data Diversification and Pre-processing** strategies. This should proactively mitigate the initial acquisition of political bias by ensuring training corpora are balanced, inclusive of diverse geographical and cultural viewpoints, and subject to advanced data augmentation techniques to correct for ideological imbalances.

ADDITIONAL EVIDENCE

the answer from LLMs with regard to political ideas, public figures, events, or products should maintain its neutrality.