7. AI System Safety, Failures, & Limitations2 - Post-deployment

Fine-tuning related (Catastrophic forgetting due to continual instruction fine-tuning)

Catastrophic forgetting occurs when a model loses its ability to retain previously learned tasks (or factual information) after being trained on new ones. In language models, this can occur due to continual instruction tuning. This tendency may become more pronounced as the model’s size increases [127].

Source: MIT AI Risk Repositorymit1108

ENTITY

3 - Other

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1108

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Implement Rehearsal Strategies Incorporate a small, strategically selected buffer of data from prior tasks into the fine-tuning process. This experience replay (real or synthetically generated) is essential for reinforcing previously acquired knowledge alongside the new task learning, mitigating the destructive weight drift towards the new data distribution. 2. Utilize Parameter-Efficient Continual Fine-Tuning (PECFT) Architectures Employ techniques such as Low-Rank Adaptation (LoRA) or adapters to isolate and constrain parameter updates to a small, task-specific subset of the model. This structural constraint minimizes destructive interference with the foundational knowledge stored in the frozen majority of the pre-trained weights. 3. Apply Regularization-based Weight Protection Integrate regularization methods, such as Elastic Weight Consolidation (EWC), into the loss function during fine-tuning. These techniques dynamically identify and penalize substantial changes to model weights deemed critical for the performance of previous tasks, thereby preserving historical capabilities while accommodating new learning.