Fine-tuning related (Catastrophic forgetting due to continual instruction fine-tuning)
Catastrophic forgetting occurs when a model loses its ability to retain previously learned tasks (or factual information) after being trained on new ones. In language models, this can occur due to continual instruction tuning. This tendency may become more pronounced as the model’s size increases [127].
ENTITY
3 - Other
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit1108
Domain lineage
7. AI System Safety, Failures, & Limitations
7.3 > Lack of capability or robustness
Mitigation strategy
1. Implement Rehearsal Strategies Incorporate a small, strategically selected buffer of data from prior tasks into the fine-tuning process. This experience replay (real or synthetically generated) is essential for reinforcing previously acquired knowledge alongside the new task learning, mitigating the destructive weight drift towards the new data distribution. 2. Utilize Parameter-Efficient Continual Fine-Tuning (PECFT) Architectures Employ techniques such as Low-Rank Adaptation (LoRA) or adapters to isolate and constrain parameter updates to a small, task-specific subset of the model. This structural constraint minimizes destructive interference with the foundational knowledge stored in the frozen majority of the pre-trained weights. 3. Apply Regularization-based Weight Protection Integrate regularization methods, such as Elastic Weight Consolidation (EWC), into the loss function during fine-tuning. These techniques dynamically identify and penalize substantial changes to model weights deemed critical for the performance of previous tasks, thereby preserving historical capabilities while accommodating new learning.