Back to the MIT repository
7. AI System Safety, Failures, & Limitations3 - Other

Robustness

Resilience against adversarial attacks and distribution shift

Source: MIT AI Risk Repositorymit505

ENTITY

2 - AI

INTENT

3 - Other

TIMING

3 - Other

Risk ID

mit505

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Establish a comprehensive data provenance and validation framework, utilizing techniques such as automated anomaly and outlier detection and strict access controls, to mitigate the risk of data and model poisoning across the entire AI lifecycle. 2. Institute a rigorous adversarial robustness program incorporating defense mechanisms such as Adversarial Training (AT) and ensemble methods to fortify the model's resilience against adversarial prompt engineering, input perturbations, and transfer attacks. 3. Deploy a continuous monitoring and evaluation system to track performance across diverse distributional scenarios and leverage adaptive learning strategies, such as online/incremental updates, to maintain accuracy and reliability in the presence of real-world distribution shift.

ADDITIONAL EVIDENCE

There are multiple reasons why the LLM might not perform as desired when deployed. The errors in a prompt can cause the model’s failure in answering the question correctly. Malicious entities can attack the system by poking the LLM using maliciously altered prompts. The usefulness of a set of particular answers might change over time (e.g. which state collects the highest state income tax). Finally, LLMs are trained on the massive data collected from the Internet where anyone, including attackers, can post content, and therefore influence LLMs’ training data, opening up the vulnerability of LLMs to poisoning attacks.