Risk area 5: Human-Computer Interaction Harms
This section focuses on risks specifically from LM applications that engage a user via dialogue, also referred to as conversational agents (CAs) [142]. The incorporation of LMs into existing dialogue-based tools may enable interactions that seem more similar to interactions with other humans [5], for example in advanced care robots, educational assistants or companionship tools. Such interaction can lead to unsafe use due to users overestimating the model, and may create new avenues to exploit and violate the privacy of the user. Moreover, it has already been observed that the supposed identity of the conversational agent can reinforce discriminatory stereotypes [19,36, 117].
ENTITY
3 - Other
INTENT
3 - Other
TIMING
2 - Post-deployment
Risk ID
mit221
Domain lineage
5. Human-Computer Interaction
5.1 > Overreliance and unsafe use
Mitigation strategy
1. Implement a Multi-Layered Security and Governance Architecture Deploy a comprehensive "Trust Layer" with dedicated AI guardrails that prioritize security controls over generative instructions to prevent exploitation and unauthorized actions. This architecture must mandate complete data isolation, sophisticated masking of Personally Identifiable Information (PII), and strict compliance with global privacy regulations (e.g., GDPR, HIPAA). Furthermore, utilize contextual privacy frameworks to guide users toward disclosing only information that is relevant and necessary for the stated task, thereby minimizing unintentional privacy breaches. 2. Foster Appropriate Reliance through Proactive Transparency and Verification Mitigate user overestimation and unsafe use by cultivating realistic mental models of the system's capabilities and limitations. This involves transparently communicating the potential for errors (e.g., hallucinations) and biases. For consequential decisions, incorporate cognitive forcing functions, such as verification prompts or friction, to signal to the user when critical review of the AI's output is necessary and to facilitate the cognitive ease of verification. 3. Establish Continuous Ethical Auditing and Behavioral Monitoring Institute continuous, systemic monitoring of the conversational agent's outputs and interactions to detect the emergence or reinforcement of discriminatory stereotypes and bias. This process must include ongoing auditing of training data and model decisions, identifying sensitive topics for appropriate human agent escalation, and implementing real-time threat detection to flag and respond to anomalous usage patterns, fraudulent access, or attempts at prompt manipulation.