Back to the MIT repository
4. Malicious Actors & Misuse2 - Post-deployment

Fine-tuning related (Ease of reconfiguring GPAI models)

GPAI models are often easily reconfigured for various use cases or have competencies beyond the intended use [78, 225]. They can be performed either by changing the weights of the model (e.g., fine-tuning) or by modifying only the model inputs (e.g., prompt engineering, jailbreaking, retrieval-augmented generation). Reconfiguration can be intentional (with the help of adversarial inputs) or unintentional (from unanticipated inputs to the model).

Source: MIT AI Risk Repositorymit1102

ENTITY

1 - Human

INTENT

3 - Other

TIMING

2 - Post-deployment

Risk ID

mit1102

Domain lineage

4. Malicious Actors & Misuse

223 mapped risks

4.0 > Malicious use

Mitigation strategy

1. Implement and maintain a state-of-the-art Safety and Security Framework throughout the entire model lifecycle, defining responsibilities for identifying, analyzing, and mitigating systemic risks stemming from ease of reconfiguration, including unauthorized releases and adversarial exploitation. 2. Systematically conduct model evaluations, including adversarial testing, using standardized protocols to proactively identify and document vulnerabilities that enable malicious or unintentional reconfiguration (e.g., jailbreaking, harmful fine-tuning pathways) to inform subsequent mitigation pathways. 3. Deploy and continuously refine layered input and output guardrails, such as content filters and filtering mechanisms, to detect and block adversarial inputs (prompt engineering) and prevent the generation of outputs that violate safety policies or facilitate unintended use cases.