Back to the MIT repository
2. Privacy & Security3 - Other

Training-related (Robustness certificates can be exploited to attack the models)

The knowledge of robustness certificates, including the area of the region for which model predictions are certified to be robust, can be used by an adversary to efficiently craft attacks that succeed just outside the certified regions [53].

Source: MIT AI Risk Repositorymit1100

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

3 - Other

Risk ID

mit1100

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

1. Implement strict information governance to ensure that exact certified radii and the internal parameters of the certification mechanism are treated as confidential, non-public information, thereby depriving adversaries of the precise knowledge required to efficiently craft attacks that succeed just outside the provable robust region. 2. Adopt advanced certified defense methodologies, such as Randomized Smoothing or formal verification techniques, to maximize the provable certified radius ($r^\*$) and ensure the certified bounds are as tight as possible, thereby minimizing the exploitable gap between the theoretical and empirical robustness boundaries. 3. Integrate run-time monitoring to couple the certified prediction with a mechanism for detecting inputs that fall just outside the certified region, enabling the system to reject the prediction or engage a fallback mechanism (e.g., human review) when a high-confidence, but uncertified, prediction occurs in close proximity to the certified boundary.