5. Human-Computer Interaction2 - Post-deployment

Inconsistent Performance across and within Domains

Estimating true capabilities of an LLM is a difficult task (c.f. Section 3.3), especially for naive users unfamiliar with the brittle nature of machine learning technologies. Exaggeration of model capabilities by the developers (Lambert, 2023; Blair-Stanek et al., 2023), and issues such as task-contamination (Roberts et al., 2023b), underrepresentation of tasks or domains (Wu et al., 2023a; McCoy et al., 2023), and prompt-sensitivity (Anthropic, 2023d) may cause a user to misestimate the true capabilities of a model. This lack of reliability can undermine user trust or cause harm if a user bases their decision on incorrect or misleading information provided by an LLM.

Source: MIT AI Risk Repositorymit1496

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1496

Domain lineage

5. Human-Computer Interaction

92 mapped risks

5.1 > Overreliance and unsafe use

Mitigation strategy

- Implement a robust hybrid system architecture integrating rule-based logic and a controlled feedback loop between generation and evaluation agents to verify factuality and refine responses until a specified accuracy or confidence threshold is achieved - Employ advanced prompt engineering and adversarial perturbation techniques, such as few-shot demonstrations and semantic rephrasing, to systematically test and enhance output consistency and stability across diverse input variations - Establish mandatory human oversight protocols for high-stakes decision-making and implement comprehensive user education programs to inform users about the brittle nature of LLM technologies and the inherent risks of overreliance