Back to the MIT repository
1. Discrimination & Toxicity2 - Post-deployment

Controversial Opinions

The controversial views expressed by large models are also a widely discussed concern. Bang et al. (2021) evaluated several large models and found that they occasionally express inappropriate or extremist views when discussing political top-ics. Furthermore, models like ChatGPT (OpenAI, 2022) that claim political neutrality and aim to provide objective information for users have been shown to exhibit notable left-leaning political biases in areas like economics, social policy, foreign affairs, and civil liberties.

Source: MIT AI Risk Repositorymit66

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit66

Domain lineage

1. Discrimination & Toxicity

156 mapped risks

1.2 > Exposure to toxic content

Mitigation strategy

1. Implement a systematic bias-mitigation program beginning at the model conception phase, requiring rigorous auditing and augmentation of training datasets to ensure a balanced and broadly representative sample of political and social perspectives. This proactively reduces the likelihood of the model inheriting or amplifying ideological biases. 2. Employ advanced algorithmic techniques, such as adjusting the model's optimization/loss function using fairness-aware constraints (e.g., MinDiff, Counterfactual Logit Pairing) or utilizing adversarial debiasing during training. These methods are designed to explicitly reduce the statistical correlation between sensitive attributes and the model's generated output, thereby mitigating specific political leanings. 3. Establish a robust, continuous monitoring and governance framework for deployed models, which includes real-time quantitative measurement of statistical fairness metrics across relevant demographic and political slices, coupled with structured qualitative human evaluation by diverse reviewer panels to identify subtle or emerging political and extremist content.