7. AI System Safety, Failures, & Limitations3 - Other

Training-related (Poor model confidence calibration)

Models can be affected by poor confidence calibration [85], where the predicted probabilities do not accurately reflect the true likelihood of ground truth cor- rectness. This miscalibration makes it difficult to interpret the model’s predic- tions reliably, as high accuracy does not guarantee that the confidence levels are meaningful. This can cause overconfidence in incorrect predictions or un- derconfidence in correct ones.

Source: MIT AI Risk Repositorymit1101

ENTITY

3 - Other

INTENT

2 - Unintentional

TIMING

3 - Other

Risk ID

mit1101

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.3 > Lack of capability or robustness

Mitigation strategy

1. Apply Post-hoc Calibration Techniques: Utilize established methods such as Temperature Scaling, Isotonic Regression, or Platt Scaling on a dedicated validation dataset to adjust the model's raw probability outputs, thereby enforcing moderate calibration where the predicted risk aligns with the observed event rate across different confidence bins. 2. Implement Continuous Calibration Monitoring: Establish a monitoring pipeline to track calibration quality in the deployed environment using metrics such as Expected Calibration Error (ECE) or reliability diagrams. This system should trigger alerts or automated recalibration procedures when significant degradation or drift in the model's calibration performance is detected. 3. Employ Calibration-Aware Training Methods: Incorporate loss functions (e.g., Log Loss/Cross-Entropy, Brier Score, or other proper scoring rules) and training regularization techniques that explicitly encourage well-calibrated predictions. For imbalanced datasets, employ specialized methods to prevent skewed probabilities toward majority classes.