This blog was adapted from the original article on AI Business.
With AI, comes an overarching priority to understand and properly leverage an organization’s vast data estate. Today, much of the petabyte-scale enterprise data store is not reused or even understood well enough to take advantage of the expanding array of free and low-cost AI tools available.
This is unfortunate, as many use cases for AI are urgent. Consider the impact of an AI tool to quickly identify sensitive data and make sure that it’s being managed with data compliance could have. The Journal of the American Medical Association recently reported on an AI-based model in use at Stanford Hospital that predicts when a patient is declining and alerts the patient’s care team.
Preparing for AI is the top business challenge for unstructured data management (57%), according to the Komprise 2024 State of Unstructured Data Management. The leading challenge in this effort is managing governance/security concerns (45%), followed by data classification and tagging (41%).
Storage IT professionals have a prominent role to play in facilitating AI and big data analytics initiatives:
- They must deliver fast, secure and scalable storage infrastructure to support AI data workloads.
- Equally, they need to classify and deliver the right data to these tools to support the work of data scientists and other data stakeholders across the enterprise.
Let’s consider the emerging concept of automated data workflows for AI.
Feeding the right data to AI and enriching metadata classification using AI are prime opportunities for enterprise IT today. These processes require easy-to-configure AI data workflows, which benefit from systematic automation.
To automate AI data workflows, you need to:
- Search and curate the right data: To create an AI data workflow, you first need a way to search across all your data estates which can be terabytes to petabytes of data to find the relevant data of interest.
- Manage data governance: When executing AI data workflows, it is essential to keep track of what corporate data was fed to which AI process so there is an audit trail. Similarly, it is important to enforce guardrails such as not sharing sensitive data with external processes. In the Komprise survey, AI data governance/security is the top future capability (47%) for unstructured data management, up from 28% in 2023.
- Cut AI costs by persisting results: Since most AI solutions have a pay-per-use billing model, it’s extremely important to avoid nasty surprises of high AI costs due to the same data being processed repeatedly. Therefore, having a global index that keeps track of the labels and tags from AI so users can search without having to run the AI process again on the same data is valuable.
- Leverage automation: The ability to automatically run the AI workflow on new data ensures that the AI is trained on the latest data without requiring cumbersome manual effort.
Sample Use Cases for AI Data Workflows
A workflow from the pharmaceutical industry could entail running a custom query across data silos to find all data for Project X using a data management solution. Next, the process could execute an external function on Project X data to look for a specific DNA sequence for a mutation. The data management software is configured to tag such data as “Mutation XYZ” and then moves only that new data set to a cloud AI service for analysis. Once the mutation data is no longer needed, the workflow finishes by moving it to a low-cost archival storage tier. The workflow could repeat with new data sets as often as needed.
Taking this one step further, what if you could apply an AI tool to your data to rapidly segment and enrich the metadata with new tags? A data scientist may not know where all the data from a certain project resides and therefore cannot automate the process of tagging it. The scientist also needs to ensure that any files with PII are segregated so they don’t wind up in an external AI or ML tool for public access. An AI data workflow could integrate with a PII scanner to help in this regard.
Or consider the application of Azure Bot Service, which allows developers to build and deploy intelligent chatbots and virtual assistants for customer service. An AI data workflow could analyze data from customer responses and then tag that data based on sentiment or customer issues and move it to a cloud data lake for future analysis.
As the AI industry evolves and matures, we’re seeing a complexity barrier that could slow down the positive developments AI can bring to people, businesses and governments. Rising above these challenges requires extreme coordination between individuals across the organization – think chief experience officers, data scientists, security professionals, storage and data management experts, and IT infrastructure people, along with HR and legal – to avoid bad outcomes and ensure that goals are aligned.
Data storage and data management leaders can contribute to this new age by connecting the dots between the unstructured data gold they manage and the best AI tools for the business. Developing and nurturing secure, intelligent AI data workflows is a sensible first step.
Learn more about Komprise Smart Data Workflows.