Security - Robustness
While AI safety focuses on threats emanating from generative AI systems, security centers on threats posed to these systems. The most extensively discussed issue in this context are jailbreaking risks, which involve techniques like prompt injection or visual adversarial examples designed to circumvent safety guardrails governing model behavior. Sources delve into various jailbreaking methods, such as role play or reverse exposure. Similarly, implementing backdoors or using model poisoning techniques bypass safety guardrails as well. Other security concerns pertain to model or prompt thefts.
ENTITY
1 - Human
INTENT
1 - Intentional
TIMING
3 - Other
Risk ID
mit76
Domain lineage
2. Privacy & Security
2.2 > AI system security vulnerabilities and attacks
Mitigation strategy
1 - Implement Multilayered Input Validation and Output Sanitization 2- Establish Continuous Behavioral Monitoring and Proactive Adversarial Testing 3- Fortify Model Integrity and Data Provenance via Strict Governance