Robustness
Resilience against adversarial attacks and distribution shift
ENTITY
2 - AI
INTENT
3 - Other
TIMING
3 - Other
Risk ID
mit505
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Establish a comprehensive data provenance and validation framework, utilizing techniques such as automated anomaly and outlier detection and strict access controls, to mitigate the risk of data and model poisoning across the entire AI lifecycle. 2. Institute a rigorous adversarial robustness program incorporating defense mechanisms such as Adversarial Training (AT) and ensemble methods to fortify the model's resilience against adversarial prompt engineering, input perturbations, and transfer attacks. 3. Deploy a continuous monitoring and evaluation system to track performance across diverse distributional scenarios and leverage adaptive learning strategies, such as online/incremental updates, to maintain accuracy and reliability in the presence of real-world distribution shift.
ADDITIONAL EVIDENCE
There are multiple reasons why the LLM might not perform as desired when deployed. The errors in a prompt can cause the model’s failure in answering the question correctly. Malicious entities can attack the system by poking the LLM using maliciously altered prompts. The usefulness of a set of particular answers might change over time (e.g. which state collects the highest state income tax). Finally, LLMs are trained on the massive data collected from the Internet where anyone, including attackers, can post content, and therefore influence LLMs’ training data, opening up the vulnerability of LLMs to poisoning attacks.