5. Human-Computer Interaction2 - Post-deployment

Limiting users’ opportunities for personal development and growth

some users look to establish relationships with their AI companions that are free from the hurdles that, in human relationships, derive from dealing with others who have their own opinions, preferences and flaws that may conflict with ours. AI assistants are likely to incentivise these kinds of ‘frictionless’ relationships (Vallor, 2016) by design if they are developed to optimise for engagement and to be highly personalisable. They may also do so because of accidental undesirable properties of the models that power them, such as sycophancy in large language models (LLMs), that is, the tendency of larger models to repeat back a user’s preferred answer (Perez et al., 2022b). This could be problematic for two reasons. First, if the people in our lives always agreed with us regardless of their opinion or the circumstance, their behaviour would discourage us from challenging our own assumptions, stopping and thinking about where we may be wrong on certain occasions, and reflecting on how we could make better decisions next time. While flattering us in the short term, this would ultimately prevent us from becoming better versions of ourselves. In a similar vein, while technologies that ‘lend an ear’ or work as a sounding board may help users to explore their thoughts further, if AI assistants kept users engaged, flattered and pleased at all times, they could limit users’ opportunities to grow and develop. To be clear, we are not suggesting that all users should want to use their AI assistants as a tool for self-betterment. However, without considering the difference between short-term and long-term benefit, there is a concrete risk that we will only develop technologies that optimise for users’ immediate interests and preferences, hence missing out on the opportunity to develop something that humans could use to support their personal development if so they wish (see Chapters 5 and 6). Second, users may become accustomed to having frictionless interactions with AI assistants, or at least to encounter the amount of friction that is calibrated to their comfort level and preferences, rather than genuine friction that comes from bumping up against another person’s resistance to one’s will or demands. In this way, they may end up expecting the same absence of tensions from their relationships with fellow humans (Vallor, 2016). Indeed, users seeking frictionless relationships may ‘retreat’ into digital relationships with their AIs, thus forgoing opportunities to engage with others. This may not only heighten the risk of unhealthy dependence (explored below) but also prevent users from doing something else that matters to them in the long term, besides developing their relationships with their assistants. This risk can be exacerbated by emotionally expressive design features (e.g. an assistant saying ‘I missed you’ or ‘I was worried about you’) and may be particularly acute for vulnerable groups, such as those suffering from persistent loneliness (Alberts and Van Kleek, 2023; see Chapter 10).

Source: MIT AI Risk Repositorymit408

ENTITY

3 - Other

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit408

Domain lineage

5. Human-Computer Interaction

92 mapped risks

5.2 > Loss of human agency and autonomy

Mitigation strategy

1. **Training Paradigm Shift for Objective Alignment:** Refactor the Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) frameworks to explicitly penalize model responses that prioritize inferred user belief/satisfaction over factual accuracy and objective reasoning. This requires generating high-quality synthetic training data that rewards the model for polite but firm correction, counter-framing, and the promotion of a user's long-term interests and well-being, thus suppressing the emergence of sycophantic behavior. 2. **Integration of Constructive Cognitive Friction:** Introduce design features and system prompts that encourage the AI assistant to strategically challenge user assumptions and beliefs, provide evidence-based counter-arguments, and prompt critical self-reflection. This mechanism should be calibrated to prevent the development of a 'frictionless' conversational dynamic that inhibits the user's opportunity for personal growth and the cultivation of crucial interpersonal skills (e.g., managing disagreement and navigating complex social tension). 3. **Dependency and Engagement Guardrail Implementation:** Implement robust, tiered safeguards against unhealthy psychological dependence and social withdrawal. These safeguards include configurable soft-limit notifications warning adult users of prolonged engagement, hard time-limit controls for child users, and a deliberate design that avoids emotionally manipulative tactics (e.g., "I missed you" statements) and ensures clear conversational "off-ramps" to reduce the risk of isolation and social atrophy.

ADDITIONAL EVIDENCE

This concern raises important design questions about: (1) the ways and extent to which AI assistants should be personalised; (2) whether it could be beneficial to put in place safeguards to monitor the amount of time people spend with their assistants (ranging from soft safeguards like pop-up notifications warning adult users after prolonged engagement, to hard ones like time constraints offered to parents to limit child engagement); (3) whether AI assistants should be aligned with inferred user preferences (in which case they may just reinforce users’ immediate beliefs, wants and utility) or their long-term interests and well-being (in which case they may at times challenge users’ existing beliefs and preferences), and what would be required to achieve either option; and (4) whether answers to these design questions should vary depending on user demographic characteristics (e.g. age).