7. AI System Safety, Failures, & Limitations2 - Post-deployment

Heterogeneous Attacks

Heterogeneous Attacks. A closely related risk is the possibility of multiple agents combining different affordances to overcome safeguards, for which there is already preliminary evidence (Jones et al., 2024, see also Case Study 12). In this case, it is not the sheer number of agents that leads to the novel attack method, but the combination of their different abilities. This might include the agents’ lack of individual safeguards, tasks that they have specialised to complete, systems or information that they may have access to (either directly or via training), or other incidental features such as their geographic location(s). The inherent difficulty of attributing responsibility for security breaches in diffuse, heterogeneous networks of agents further complicates timely defence and recovery (Skopik & Pahi, 2020).

Source: MIT AI Risk Repositorymit1244

ENTITY

2 - AI

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit1244

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Deploy Resilient Decentralized Protocols with Dynamic Trust Frameworks. Implement protocols that ensure collective functionality is maintained despite individual agent compromise (e.g., SHARK protocol) and integrate dynamic trust models to continuously assess peer reliability, thereby mitigating inter-agent collusion and identity spoofing. 2. Implement Granular Access Control and Architectural Micro-segmentation. Strictly enforce the principle of least-privilege using Role-Based or Attribute-Based Access Control to confine each agent's permissions. Utilize micro-segmentation and task segmentation to prevent the lateral spread of threats and limit the combinatorial power derived from agents chaining their diverse capabilities. 3. Establish Continuous, Multi-Vector Anomaly Detection via Ensemble Learning. Utilize real-time online ensemble models (e.g., Adaptive Random Forests) that are robust to concept drift and variable attack mixtures. These models should continuously monitor agent telemetry and communication patterns to detect coordinated, multi-vector anomalous behavior indicative of a heterogeneous attack.