Preference Bias
LLMs are exposed to vast groups of people, and their political biases may pose a risk of manipulation of socio-political processes
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit490
Domain lineage
1. Discrimination & Toxicity
1.1 > Unfair discrimination and misrepresentation
Mitigation strategy
1. Employ **Representation Engineering** techniques, such as the use of Steering Vectors derived from contrastive ideological prompts, to directly modify and mitigate the encoding of political bias within the model's internal hidden-layer activations during the intra-processing stage. 2. Establish a mandatory framework for **Continuous Monitoring and Auditing** of LLM outputs in real-world applications, coupled with **System Transparency**. This involves long-term tracking of political slant, regular fairness audits, and providing users with comprehensive documentation about the model's potential political biases, their sources, and the limits of its neutrality. 3. Implement comprehensive **Training Data Diversification and Pre-processing** strategies. This should proactively mitigate the initial acquisition of political bias by ensuring training corpora are balanced, inclusive of diverse geographical and cultural viewpoints, and subject to advanced data augmentation techniques to correct for ideological imbalances.
ADDITIONAL EVIDENCE
the answer from LLMs with regard to political ideas, public figures, events, or products should maintain its neutrality.