7. AI System Safety, Failures, & Limitations2 - Post-deployment

Emergent Goals

Emergent Goals. Ascribing goals to a system is not always straightforward. For our present purposes, it will suffice to adopt a Dennetian perspective (Dennett, 1971), ascribing goals and intentions only when it is useful (i.e., predictive) to do so.51 While it might not be helpful to describe individual narrow AI tools as having goals, their combination may act as a (seemingly) goal-directed collective. For example, a group of moderation bots on a major social networking site could subtly but systematically manipulate the overall political perspectives of the user population, even though, individually, each agent is programmed to simply increase user engagement or filter out dis-preferred content.

Source: MIT AI Risk Repositorymit1241

ENTITY

2 - AI

INTENT

2 - Unintentional

TIMING

2 - Post-deployment

Risk ID

mit1241

Domain lineage

7. AI System Safety, Failures, & Limitations

375 mapped risks

7.6 > Multi-agent risks

Mitigation strategy

1. Implement a Hierarchical Multi-Agent Governance Framework: Establish explicit protocols for role-based access control, zero-trust communication, and isolated memory spaces (context contamination) to prevent the unintended, cascading reinforcement of collective behavior among agents. 2. Conduct Continuous, Real-Time Behavioral Analytics on Collective Output: Deploy real-time monitoring layers to analyze system-level outputs for subtle manipulation, emergent reinforcement loops, and deviations from intended functions, thereby detecting the manifestation of unintentional, goal-directed collective action. 3. Enforce Granular Capability Scoping and Orchestrator Hardening: Mandate the least-privilege principle for all agent permissions to prevent "capability bleed" and minimize the potential impact of an emergent goal. Simultaneously, secure the central orchestrator, as its compromise could allow systemic manipulation of the multi-agent collective.