Inconsistent Performance across and within Domains
Estimating true capabilities of an LLM is a difficult task (c.f. Section 3.3), especially for naive users unfamiliar with the brittle nature of machine learning technologies. Exaggeration of model capabilities by the developers (Lambert, 2023; Blair-Stanek et al., 2023), and issues such as task-contamination (Roberts et al., 2023b), underrepresentation of tasks or domains (Wu et al., 2023a; McCoy et al., 2023), and prompt-sensitivity (Anthropic, 2023d) may cause a user to misestimate the true capabilities of a model. This lack of reliability can undermine user trust or cause harm if a user bases their decision on incorrect or misleading information provided by an LLM.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1496
Domain lineage
5. Human-Computer Interaction
5.1 > Overreliance and unsafe use
Mitigation strategy
- Implement a robust hybrid system architecture integrating rule-based logic and a controlled feedback loop between generation and evaluation agents to verify factuality and refine responses until a specified accuracy or confidence threshold is achieved - Employ advanced prompt engineering and adversarial perturbation techniques, such as few-shot demonstrations and semantic rephrasing, to systematically test and enhance output consistency and stability across diverse input variations - Establish mandatory human oversight protocols for high-stakes decision-making and implement comprehensive user education programs to inform users about the brittle nature of LLM technologies and the inherent risks of overreliance