Back to the MIT repository
5. Human-Computer Interaction2 - Post-deployment

AI-generated advice influencing user moral judgment

AIs can easily give moral advice even when not having a coherent, contradictions- free moral stance. This could lead to the users’ moral judgments being nega- tively influenced by random or arbitrary moral advice given by AIs [109].

Source: MIT AI Risk Repositorymit1173

ENTITY

2 - AI

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit1173

Domain lineage

5. Human-Computer Interaction

92 mapped risks

5.1 > Overreliance and unsafe use

Mitigation strategy

1. Establish rigorous Human-in-Command Protocols Implement a mandatory human-in-command (HIC) framework for any AI interaction or output concerning subjective moral, ethical, or high-stakes social issues. This requires the user or a designated human professional to apply independent judgment and explicitly approve or modify the AI-generated advice before acting on it. The system must be designed to interrupt the workflow and mandate human review rather than allowing automated adoption of moral suggestions, thereby mitigating overreliance on AI for ethical decision-making. 2. Enforce Radical Transparency of Moral and Cognitive Limitations Deploy clear, persistent, and context-aware disclaimers and uncertainty expressions within the user interface whenever the AI provides advice touching on moral or value-laden topics. These warnings must explicitly state that the AI lacks consciousness, a coherent moral framework, or the capacity for true ethical discernment, characterizing its outputs as purely algorithmic suggestions derived from training data. This action aims to calibrate the user's mental model and prevent the perception of the AI as a trusted moral agent or confidant. 3. Implement Coherence-Based Ethical Alignment and Filtering Develop and integrate pre-defined, non-negotiable ethical boundary protocols and alignment mechanisms into the model's architecture. These technical constraints must systematically test and filter AI outputs to prevent the generation of contradictory, incoherent, or overtly harmful moral advice. This measure focuses on addressing the risk's root cause by establishing guardrails that maintain coherence with fundamental human values and ethics, blocking "random or arbitrary moral advice" at the generation layer.