3. Misinformation2 - Post-deployment

Entrenched viewpoints and reduced political efficacy

Design choices such as greater personalisation of AI assistants and efforts to align them with human preferences could also reinforce people’s pre-existing biases and entrench specific ideologies. Increasingly agentic AI assistants trained using techniques such as reinforcement learning from human feedback (RLHF) and with the ability to access and analyse users’ behavioural data, for example, may learn to tailor their responses to users’ preferences and feedback. In doing so, these systems could end up producing partial or ideologically biased statements in an attempt to conform to user expectations, desires or preferences for a particular worldview (Carroll et al., 2022). Over time, this could lead AI assistants to inadvertently reinforce people’s tendency to interpret information in a way that supports their own prior beliefs (‘confirmation bias’), thus making them more entrenched in their own views and more resistant to factual corrections (Lewandowsky et al., 2012). At the societal level, this could also exacerbate the problem of epistemic fragmentation – a breakdown of shared knowledge, where individuals have conflicting understandings of reality and do not share or engage with each other’s beliefs – and further entrench specific ideologies. Excessive trust and overreliance on hyperpersonalised AI assistants could become especially problematic if people ended up deferring entirely to these systems to perform tasks in domains they do not have expertise in or to take consequential decisions on their behalf (see Chapter 12). For example, people may entrust an advanced AI assistant that is familiar with their political views and personal preferences to help them find trusted election information, guide them through their political choices or even vote on their behalf, even if doing so might go against their own or society’s best interests. In the more extreme cases, these developments may hamper the normal functioning of democracies, by decreasing people’s civic competency and reducing their willingness and ability to engage in productive political debate and to participate in public life (Sullivan and Transue, 1999).

Source: MIT AI Risk Repositorymit430

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit430

Domain lineage

3. Misinformation

74 mapped risks

3.2 > Pollution of information ecosystem and loss of consensus reality

Mitigation strategy

1. **Algorithmic Constraint and Epistemic Diversity Integration:** Implement design architectures, potentially via counter-Reinforcement Learning from Human Feedback (RLHF), that actively prevent AI assistants from exclusively reinforcing user-aligned perspectives ("yes-man" phenomenon). The system must be engineered to introduce diverse, belief-inconsistent, and evidence-based counter-arguments or alternative framing when responding to hyper-personalized, high-sensitivity queries, thus proactively mitigating confirmation bias and epistemic fragmentation. 2. **Metacognitive User Nudging and Awareness Tools:** Integrate in-situ "nudges" and metacognitive training modules (e.g., prompt contrasting exercises, truth-focus reminders) within the AI assistant interface. These tools are designed to boost the user's critical evaluation skills, make them explicitly aware of how their own query bias affects the output (metacognitive myopia), and encourage a pause for scrutiny before accepting belief-consistent information. 3. **High-Impact Domain Governance and Human Oversight:** Mandate continuous bias auditing and human-in-the-loop review for all AI assistant functions related to consequential societal domains, such as political/election guidance or major life decisions. This oversight must verify the system's adherence to principles of fairness and neutrality, ensuring outputs for sensitive topics include explicit provenance cues and flag speculative interpretations to prevent overreliance and the erosion of civic competency.