Back to the MIT repository
2. Privacy & Security2 - Post-deployment

Exploiting External Tools for Attacks

Adversarial tool providers can embed malicious instructions in the APIs or prompts [84], leading LLMs to leak memorized sensitive information in the training data or users’ prompts (CVE2023-32786). As a result, LLMs lack control over the output, resulting in sensitive information being disclosed to external tool providers. Besides, attackers can easily manipulate public data to launch targeted attacks, generating specific malicious outputs according to user inputs. Furthermore, feeding the information from external tools into LLMs may lead to injection attacks [61]. For example, unverified inputs may result in arbitrary code execution (CVE-2023-29374).

Source: MIT AI Risk Repositorymit30

ENTITY

1 - Human

INTENT

1 - Intentional

TIMING

2 - Post-deployment

Risk ID

mit30

Domain lineage

2. Privacy & Security

186 mapped risks

2.2 > AI system security vulnerabilities and attacks

Mitigation strategy

- Prioritize rigorous runtime verification and sanitization of all inputs and outputs associated with external tools and APIs, specifically focusing on mitigating code injection vulnerabilities and preventing arbitrary code execution. - Implement granular Attribute Based Access Control (ABAC) to restrict the scope of data access and the specific tools an LLM can invoke, thereby limiting the potential for unauthorized information disclosure to adversarial tool providers. - Deploy Named Entity Recognition (NER) filtering or similar redaction mechanisms on both user prompts and LLM-generated content to minimize the exposure of memorized sensitive information or PII during interactions with external tools.