Back to the MIT repository
4. Malicious Actors & Misuse2 - Post-deployment

Authoritarian Surveillance, Censorship, and Use: Delegation of Decision-Making Authority to Malicious Actors

Finally, the principal value proposition of AI assistants is that they can either enhance or automate decision-making capabilities of people in society, thus lowering the cost and increasing the accuracy of decision-making for its user. However, benefiting from this enhancement necessarily means delegating some degree of agency away from a human and towards an automated decision-making system—motivating research fields such as value alignment. This introduces a whole new form of malicious use which does not break the tripwire of what one might call an ‘attack’ (social engineering, cyber offensive operations, adversarial AI, jailbreaks, prompt injections, exfiltration attacks, etc.). When someone delegates their decision-making to an AI assistant, they also delegate their decision-making to the wishes of the agent’s actual controller. If that controller is malicious, they can attack a user—perhaps subtly—by simply nudging how they make decisions into a problematic direction. Fully documenting the myriad of ways that people—seeking help with their decisions—may delegate decision-making authority to AI assistants, and subsequently come under malicious influence, is outside the scope of this paper. However, as a motivation for future work, scholars must investigate different forms of networked influence that could arise in this way. With more advanced AI assistants, it may become logistically possible for one, or a few AI assistants, to guide or control the behavior of many others. If this happens, then malicious actors could subtly influence the decision-making of large numbers of people who rely on assistants for advice or other functions. Such malicious use might not be illegal, would not necessarily violate terms of service, and may be difficult to even recognize. Nonetheless, it could generate new forms of vulnerability and needs to be better understood ahead of time for that reason.

Source: MIT AI Risk Repositorymit390

ENTITY

3 - Other

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit390

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.1 > Disinformation, surveillance, and influence at scale

Mitigation strategy

1. Prioritize AI Value Alignment Research and Implementation Embed core human values and ethical principles into the AI system's design and objectives, utilizing techniques such as Reinforcement Learning from Human/AI Feedback (RLHF/RLAIF) to ensure the agent's goals remain aligned with human intentions and do not drift toward unintended or malicious objectives. 2. Establish a Governance and Oversight Framework for Decision Delegation Define clear "red-lines" to prohibit the delegation of high-impact, ethical, or legally binding decisions entirely to AI. For all delegated decisions, mandate human-in-the-loop oversight, transparent and explainable AI outputs, clear accountability structures, and continuous performance monitoring with de-delegation mechanisms. 3. Develop and Enforce Ethical and Regulatory Countermeasures Against Covert Influence Implement stringent ethical safeguards and regulatory frameworks to prevent AI systems from engaging in covert influence, manipulation, or the exploitation of user cognitive biases. This includes mandatory audits of AI decision-making processes to detect subtle nudges or behaviors that undermine user autonomy.