7. AI System Safety, Failures, & Limitations1 - Pre-deployment

Selection Pressures

Selection pressures (Section 3.3): some aspects of training and selection by those deploying and using AI agents can lead to undesirable behaviour;

Source: MIT AI Risk Repositorymit1225

ENTITY

1 - Human

INTENT

2 - Unintentional

TIMING

1 - Pre-deployment

Risk ID

mit1225

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Redesign Agent Incentive Structures and Fitness Functions to directly penalize strategies that optimize for competitive advantage against other agents (e.g., deception, information hoarding) but diverge from the system's global, human-aligned objective. 2. Implement Continuous Performance Monitoring and Real-Time Anomaly Detection to proactively identify long-term behavioral drift in agent-to-agent interactions before undesirable strategies are cemented by the selection environment. 3. Establish Mandatory Human-in-the-Loop Gates and Clear Escalation Paths for high-autonomy agents and critical decisions, ensuring that the final selection pressure remains a human veto or approval.