Safety & Trustworthiness
A comprehensive assessment of LLM safety is fundamental to the responsible development and deployment of these technologies, especially in sensitive fields like healthcare, legal systems, and finance, where safety and trust are of the utmost importance.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit646
Domain lineage
7. AI System Safety, Failures, & Limitations
7.0 > AI system safety, failures, & limitations
Mitigation strategy
1. Establish a formal, enterprise-wide AI Risk Management Framework (RMF) and Governance Model. This foundational measure must align with recognized standards, such as the NIST AI RMF, to systematically Govern, Map, Measure, and Manage risks throughout the LLM lifecycle. The governance structure must mandate clear roles, responsibilities, and accountability for safety, ethical compliance, and data lineage, particularly for systems deployed in high-stakes environments. 2. Implement layered technical security safeguards focusing on stringent Input Validation and Output Sanitization. Input surface controls (e.g., filtering untrusted data sources, limiting prompt complexity) are necessary to mitigate prompt injection, while robust Output Validation (e.g., content filtering, schema checks, non-execution of generated code) is paramount to prevent the LLM from producing malicious code, exposing sensitive data, or enabling unsafe actions. 3. Mandate continuous model monitoring, human-in-the-loop (HITL) oversight, and proactive Adversarial Stress Testing (Red Teaming). Continuous monitoring of prompts, outputs, and usage patterns is essential for the early detection of abuse and performance drift. Furthermore, scheduled Red Teaming must be employed to proactively identify and mitigate vulnerabilities against sophisticated adversarial attacks like jailbreaking, thereby ensuring sustained model resilience and trustworthiness in deployment.