Unstructured Data Management for Healthcare AI Pipelines

Last year, Gartner predicted that through 2026, organizations will abandon 60% of AI projects unsupported by AI ready data. This prediction was shared in the Komprise session at the 2025 Gartner Infrastructure, Operations and Cloud Strategies (IOCS) conference. That prediction set the tone for a broader discussion happening in healthcare where AI is a hot topic for transformation and positive change.

Regardless of the sector, many organizations are discovering that the prominent obstacle to adoption is not algorithms or models, but the state of their data.
Roughly 80 to 90% of healthcare data is unstructured, including images, clinical notes, pathology slides, and diagnostic reports.
This data lacks structure and context, which means it’s hard for users to search across data stores to find what they need for AI projects.

At the same time, healthcare AI investment continues to rise. Healthcare organizations are implementing commercial AI solutions at more than twice the rate (2.2x) of the broader US economy, according to a recent report from Menlo Ventures.

Yet many AI initiatives stall early. The Komprise Gartner session took a different approach featuring a healthcare organization that successfully addressed unstructured data challenges and improved AI outcomes at the same time.

Krishna Subramanian, cofounder and president of Komprise, walked through a healthcare customer implementation that illustrated how managing unstructured data more deliberately can improve AI return on investment while also controlling hybrid storage costs.

Addressing the Unstructured Data Barriers in AI

Most enterprise data is unstructured and not immediately usable for AI. Healthcare data is diverse, fragmented, inconsistently labeled, duplicated across systems, and retained for long periods due to stringent regulatory and clinical requirements. Sensitive data must be protected at all costs. Overall, it can be difficult to locate, filter and operationalize unstructured data for analytics or machine learning.

As a result, AI data pipelines can be inefficient, costly and of poor quality. Teams either over copy data to the cloud, driving up costs, or struggle to get models access to the right data at the right time.

A Digital Pathology AI Data Workflow Use Case

A senior technical architect at one of the nation’s leading healthcare institutions, a Komprise customer, took the stage to discuss a promising program in digital pathology. Digital pathology has become one of the fastest growing applications of AI in healthcare, improving diagnostic accuracy and reducing turnaround time. However, digitizing medical images creates extreme data growth.

At the integrated healthcare organization, digital pathology generates more than 2PB of new data each year. Individual pathology slides can reach 50 gigabytes in size, and retention requirements often span decades on premises.

Clinical teams wanted to use cloud-based AI to improve its diagnostic processes. However, the underlying data architecture quickly became a bottleneck. The organization needed to keep primary data in its on-premises NAS while still taking advantage of cloud AI services, yet high costs were becoming a barrier.

Limitations of Traditional Data Movement

The IT team initially attempted to move data into the cloud using a common cloud data sync tool. This approach exposed several challenges. The architecture did not scale well, lacked granular data selection, and required copying entire datasets regardless of whether all the data was needed for AI processing. As a result, cloud data storage costs increased rapidly, butting up against the expected ROI of the AI initiative.

Komprise: A More Targeted Approach to AI Ingestion

To address these issues, the IT team implemented Komprise as part of its digital pathology workflow. New pathology slides are deposited into on-premises storage and ingested on a frequent schedule. Only the relevant data needed for AI analysis is sent to the cloud application for analysis. Results are verified by pathologists, and the cloud data is automatically deleted after a defined period.

This workflow changed how data moved between on premises-infrastructure and cloud AI systems. Instead of treating the cloud as a long-term repository, it became a temporary processing layer. With a more efficient solution in place, the healthcare organization is now able to expand the digital pathology initiative while also making plans for similar AI projects in other clinical areas. Learn more about Komprise Intelligent AI Ingest.

Measurable Results

Rather than copying everything, only newly-generated and relevant data was sent for analysis.
The organization reduced its cloud storage footprint from petabytes to terabytes, resulting in north of 90% lower cloud storage costs.
Clinicians are benefiting from more accurate diagnostic reports and a 3X faster turnaround in results. Patients are delighted and treatment plans can ensue without delay.
This combination of cost control and improved access to insights is critical as healthcare organizations look to scale AI to address a host of industry pain points threatening survival.

Why Komprise for this digital pathology AI data workflow use case at Gartner IOCS

Key Takeaways

The session concluded with two themes that resonated beyond healthcare. First, the ability to find and move the right unstructured data across hybrid infrastructure is essential for both operational efficiency and AI success. Second, a thoughtful approach to unstructured data management can reduce costs while improving AI outcomes.

As the Gartner prediction reminds us, AI projects rarely fail because of algorithms alone. More often, they fail because data is not prepared, governed, or delivered in a way that supports real world use. This example showed that organizations do not have to choose between cost control and AI innovation. With the right data strategy, they can support both.

Watch the full video on YouTube

Learn more about Komprise for Hospitals and Healthcare Systems

Healthcare AI Data Pipelines Explored at Gartner IOCS

Addressing the Unstructured Data Barriers in AI