AI-generated advice influencing user moral judgment
AIs can easily give moral advice even when not having a coherent, contradictions- free moral stance. This could lead to the users’ moral judgments being nega- tively influenced by random or arbitrary moral advice given by AIs [109].
ENTITY
2 - AI
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit1173
Domain lineage
5. Human-Computer Interaction
5.1 > Overreliance and unsafe use
Mitigation strategy
1. Establish rigorous Human-in-Command Protocols Implement a mandatory human-in-command (HIC) framework for any AI interaction or output concerning subjective moral, ethical, or high-stakes social issues. This requires the user or a designated human professional to apply independent judgment and explicitly approve or modify the AI-generated advice before acting on it. The system must be designed to interrupt the workflow and mandate human review rather than allowing automated adoption of moral suggestions, thereby mitigating overreliance on AI for ethical decision-making. 2. Enforce Radical Transparency of Moral and Cognitive Limitations Deploy clear, persistent, and context-aware disclaimers and uncertainty expressions within the user interface whenever the AI provides advice touching on moral or value-laden topics. These warnings must explicitly state that the AI lacks consciousness, a coherent moral framework, or the capacity for true ethical discernment, characterizing its outputs as purely algorithmic suggestions derived from training data. This action aims to calibrate the user's mental model and prevent the perception of the AI as a trusted moral agent or confidant. 3. Implement Coherence-Based Ethical Alignment and Filtering Develop and integrate pre-defined, non-negotiable ethical boundary protocols and alignment mechanisms into the model's architecture. These technical constraints must systematically test and filter AI outputs to prevent the generation of contradictory, incoherent, or overtly harmful moral advice. This measure focuses on addressing the risk's root cause by establishing guardrails that maintain coherence with fundamental human values and ethics, blocking "random or arbitrary moral advice" at the generation layer.