AI Data Governance

AI Data Governance was identified in the third annual state of unstructured data management survey as a top concern for generative AI adoption in the enterprise, which includes privacy, security and the lack of data source transparency in vendor solutions. The press release noted:

As the generative AI marketplace expands and executives push for departments to leverage new solutions for competitive advantage, the need for an unstructured data governance agenda is strong; IT leaders cannot forsake data integrity, data protection and risk faulty or dangerous outcomes from generative AI projects.

In the post 5 Unstructured Data Tips for AI, Komprise cofounder and COO Krishna Subramanian reviewed five areas to consider across security, privacy, lineage, ownership and governance of unstructured data for AI.

Data Security for AI

Komprise-2023-State-of-Unstructured-Data-Management_-Linkedin-Social-1200px-x-628pxData confidentiality and security are at risk with third-party generative AI applications because your data becomes part of the LLM and the public domain once you feed it into a tool. Get clear on the legal agreements in place by the vendor as pertains to your data. There are new ways to manage this now: ChatGPT now allows users to disable chat history so that chats won’t be used to train its models, although OpenAI retains the data for 30 days. One way to protect your organization is to segregate sensitive and proprietary data into a private, secure domain which restricts sharing with commercial applications. You can also maintain an audit trail of your corporate data that has fed AI applications.

Data Privacy for AI

When you create a prompt for an AI tool to produce an output based on your query, you don’t know if the result will include protected data, such as PII, from another organization. Your company may be liable if you use the tool’s output externally in content or a product and the PII is discoverable. As well, since non-AI vendors are now incorporating AI tools into their solutions, perhaps even without their customers’ knowledge, the risk compounds. Your commercial backup solution could incorporate a pretrained model to find anomalies in your data and that model may contain PII data; this could indirectly put you at a risk of violation. Data provenance and transparency around the training data used in an AI application are critical to ensure privacy.

Data Lineage for AI

Today there is not much transparency with data sources in generative AI applications. They may contain biased, libelous or unverified data sources. This makes using GenAI tools circumspect when you need results that are factually accurate and objective. Consider the problem you are trying to solve with AI to choose the right tool. Machine learning systems are better for tasks which require a deterministic outcome.

Data Ownership for AI

The data ownership piece of generative AI concerns what happens when you derive a work: who owns the IP? As it stands today, copyright law dictates that “works created solely by artificial intelligence — even if produced from a text prompt written by a human — are not protected by copyright,” according to reporting by BuiltIn. As well, the article continues, copyrighted materials used in training AI models, is permitted under the fair use law. There are currently a batch of lawsuits under consideration, however, challenging this law. It will be increasingly important for organizations to track who commissioned derivative works and how those works are used internally and externally.

Data Governance for AI

If you work in a regulated industry, you’ll need to show an audit trail of any data used in an AI tool and demonstrate that your organization is complying. A healthcare organization, for instance, would need to verify that no patient PII data has been leaked to an AI solution per HIPAA rules. This requires a data governance framework for AI that covers privacy, data protection, ethics and more. Unstructured data management solutions help by providing a means to monitor data usage in AI tools and create a foundation for unstructured data governance.

Other Considerations for AI Data Governance

At a high-level, AI data governance is the framework, policies, and procedures organizations put in place to ensure that data used in artificial intelligence (AI) systems is managed, processed, and utilized in a responsible, ethical, and compliant manner. It involves establishing guidelines for collecting, storing, processing, and using data within AI systems. Key components of AI data governance typically include:

  • Data Quality and Integrity: Ensuring that the data used in AI models is accurate, reliable, and free from biases or errors. This involves data validation, cleaning, and maintaining data integrity throughout its lifecycle.
  • Data Privacy and Security: Implementing measures to protect sensitive data, adhering to relevant data protection regulations (such as GDPR, CCPA), and securing data against unauthorized access or breaches.
  • Compliance and Regulations: Ensuring that AI initiatives comply with legal and regulatory frameworks. This involves understanding and adhering to laws and guidelines governing data usage, such as industry-specific regulations and international standards.
  • Ethical Use of Data: Establishing ethical guidelines for the collection, storage, and usage of data in AI applications. This includes considering fairness, accountability, and transparency in AI decision-making processes.
  • Data Lifecycle Management: Managing data throughout its lifecycle, from collection to processing, analysis, and disposal. This involves tracking the lineage of data, maintaining proper documentation, and ensuring responsible data handling at every stage.
  • Risk Management: Identifying and mitigating potential risks associated with data usage in AI systems, such as bias, security vulnerabilities, or unintended consequences of AI decision-making.
  • Accountability and Transparency: Establishing mechanisms to ensure accountability for AI models and making the decision-making process transparent to relevant stakeholders. This involves explaining AI model behavior and outcomes in an understandable manner.

Effective AI data governance is critical to building trust in AI systems, ensuring that they operate in a manner that respects data privacy, security, and ethical considerations. It also helps organizations make more informed decisions, reduce risks, and maintain compliance with regulatory requirements.

In this Data on the Move, we discuss AI and Unstructured Data Management.


Want To Learn More?

Related Terms

Getting Started with Komprise:

Contact | Data Assessment