Generative AI User Data
Many generative AI tools require users to log in for access, and many retain user information, including contact information, IP address, and all the inputs and outputs or “conversations” the users are having within the app. These practices implicate a consent issue because generative AI tools use this data to further train the models, making their “free” product come at a cost of user data to train the tools. This dovetails with security, as mentioned in the next section, but best practices would include not requiring users to sign in to use the tool and not retaining or using the user-generated content for any period after the active use by the user.
ENTITY
1 - Human
INTENT
2 - Unintentional
TIMING
2 - Post-deployment
Risk ID
mit522
Domain lineage
2. Privacy & Security
2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Mitigation strategy
1. Implement Robust and Granular Consent and Transparency Frameworks Require explicit, informed, and granular user consent for all data processing activities, particularly for secondary purposes such as generative model training. Establish public-facing, unambiguous data governance policies that fully detail the collection, processing, storage, sharing, and retention of user-generated inputs and outputs. Provide users with easily accessible mechanisms to opt out of data reuse for model enhancement and to exercise their statutory rights, such as the right to access and deletion of their data. 2. Enforce Strict Data Minimization and Protection Protocols Adhere to a data minimization principle, ensuring that only data strictly necessary for the intended function of the generative AI tool is collected and retained. Proactively apply technical data protection methods—such as encryption, data masking, or pseudonymization—to all user inputs and outputs, especially for personally identifiable information (PII), before it is used for model training or inference. 3. Establish Secure System Vetting and Access Controls Conduct thorough vendor risk assessments, vetting all generative AI tools for compliance with established security and data governance standards. Furthermore, implement stringent Identity and Access Management (IAM) practices based on the principle of least privilege, restricting internal access to training data and model artifacts to only authorized personnel, thereby mitigating the risk of unauthorized use or internal data leakage.