The State of Unstructured Data Management
Komprise Survey Finds IT Leaders Lack Insights for Hybrid Cloud Unstructured Data Management
Download the 2022 State of Unstructured Data Management Report
Persistent data growth is straining IT budgets, causing more organizations to prioritize cloud data migrations — but data visibility, planning and management across hybrid clouds remains a key roadblock.
The 2021 Komprise Unstructured Data Management Report examines the challenges and opportunities with unstructured data in the enterprise—from how much data enterprises are managing, to cloud data priorities and future approaches for data management. This report summarizes responses of 300 global enterprise storage IT decision makers at companies with more than 1,000 employees in the United States and in the UK. All respondents work at the IT manager level or above, across IT/technology operations teams.
Highlights of the survey include:
- 65.5% of organizations spend +30% of IT budgets on data storage and management.
- 44.5% want better visibility for planning.
- Investing in analytics tools is the highest priority (45%) over buying more cloud or on-prem storage or modernizing backups.
Read the latest State of Unstructured Dat Management Here
“The survey shows that enterprises want analytics and systematic data management to make the best decisions on cloud migrations and archiving. The end goal is to cut storage costs and create new value from unstructured data over time.”
-Krishna Subramanian, President and COO of Komprise
The 2021 Komprise State of Unstructured Data Management report was the first of its kind — what did it find, and what did it get right about where the market was heading?
The first Komprise State of Unstructured Data Management report, published in August 2021, surveyed 300 enterprise storage IT decision makers at organizations with more than 1,000 employees across the United States and the United Kingdom. It was the first systematic attempt to quantify what IT leaders actually knew, feared, and planned regarding their growing unstructured data estates. Reading it in 2026, two things stand out: how accurately it identified the structural problems, and how dramatically every metric has since worsened:
- The 1PB threshold was the floor in 2021 — the majority of organizations surveyed were already managing more than 1PB of data and spending more than 30% of IT budgets on data storage and protection; five years later, 74% are managing more than 5PB and 40% are managing more than 10PB — the scale has multiplied while the percentage of budget consumed has remained stubbornly above 30%
- The analytics-first instinct was correct — investing in analytics tools was the highest priority at 45% over buying more cloud or on-premises storage or modernizing backups; the organizations that followed that instinct in 2021 built the visibility and metadata foundation that AI initiatives now require; those that kept buying more storage instead are five years behind
- Cold data awareness was emerging — one-third of enterprises acknowledged that over 50% of their data was cold, while 20% did not know, suggesting a widespread need to right-place data through its lifecycle; that 20% who did not know is a governance gap that has since been identified as a primary AI risk factor
- Cloud migration was already a top priority — top priorities for cloud data management included migrating data to the cloud at 56%, cutting storage and data costs at 46%, and governance and security of data in the cloud at 41%; the sequence the 2021 report described — migrate, cut costs, then govern — proved to be the wrong order; organizations that governed first are better positioned for AI than those that migrated first
- The 2026 report is where that story ends up — five annual surveys later, the Komprise 2026 State of Unstructured Data Management captures exactly what happened to the organizations that acted on 2021’s priorities versus those that did not; it is available at komprise.com
In 2021, most IT leaders did not know how much of their data was cold — why did that gap matter then and why is it even more consequential now?
In 2021, one-third of enterprises acknowledged that over 50% of their data was cold, while 20% did not know at all how much of their data was inactive. That visibility gap — knowing storage is expensive without knowing what is actually worth storing at premium cost — was the foundational problem the 2021 report identified. In 2026, the same gap carries a compounded cost:
- Five years of unexamined cold data accumulation — organizations that lacked visibility in 2021 and did not invest in analytics tools have accumulated five more years of cold, unclassified data on expensive primary storage; the backlog is now proportionally harder and more expensive to address
- Flash prices make cold data on hot storage a crisis, not just an inefficiency — IDC describes the current memory shortage as a potentially permanent reallocation of global silicon wafer capacity, with 2026 NAND and DRAM supply growth expected to remain below historical norms; the cost of leaving cold data on all-flash NAS, which was already unreasonable in 2021, is now significantly worse as hardware prices rise while the data volumes consuming that hardware have multiplied
- The 20% who did not know are the most exposed — organizations without visibility into cold data are also without visibility into sensitive data; the 2021 report’s 20% who did not know their cold data ratio are likely the same organizations that in 2026 cannot confirm whether their unstructured data estates contain PHI, PII, or IP that has been accessed by unauthorized AI tools
- The Flash Stretch Assessment addresses this directly — for qualified enterprises managing 500TB or more, the Komprise Flash Stretch Assessment is specifically designed to answer the question the 2021 report found 20% of IT leaders could not: how much cold data is sitting on expensive primary storage, what is it costing, and what would transparent tiering to lower-cost storage save
- Visibility is still the prerequisite for everything else — the 2021 report correctly placed analytics investment as the top priority above buying more storage or modernizing backups; Komprise Analysis, included in both Komprise Elastic Data Migration and Komprise Intelligent Data Management, provides exactly this visibility across all NAS, cloud, and object storage environments without agents or changes to existing infrastructure
The 2021 report found hybrid cloud was the dominant reality — how has the hybrid cloud storage landscape changed since and what does that mean for data management strategy today?
Half of enterprises in 2021 had data stored in a mix of on-premises and cloud-based storage, and most organizations — 62.5% — planned to spend more on storage in 2021 than in 2020 despite the economic pressures of the pandemic year. Hybrid cloud was already the operating reality, not an aspiration. Five years on, the complexity of that hybrid environment has deepened in ways the 2021 report could only hint at:
- Hybrid has become multi-vendor, multi-protocol, and multi-cloud — in 2021 hybrid meant on-premises NAS plus one or two cloud providers; in 2026 it means NetApp alongside Dell alongside IBM alongside VAST Data alongside Nasuni alongside Everpure, feeding AWS, Azure, and Google Cloud simultaneously, with object storage, NAS, and block storage all in play; the management complexity has scaled faster than the data volumes
- Data silos multiplied with every cloud and storage addition — the majority of organizations were spending more than 30% of their IT budget on data storage in 2021; that data storage cost has not decreased as hybrid environments expanded, because adding new storage tiers without retiring old ones creates additive cost, not substitutive cost; the data management layer that makes hybrid practical rather than chaotic is the gap the 2021 report identified and that Komprise fills
- Cloud-native AI access requires the right architecture — organizations that moved data to cloud in 2021 using storage-vendor tiering tools stored it in proprietary formats that cloud AI services cannot read directly; organizations that used file-level tiering with Komprise stored data as native objects, immediately accessible to cloud AI services without a secondary conversion step; the architectural choice of 2021 determines AI data accessibility in 2026
- Komprise is the metadata and orchestration layer for enterprise unstructured AI data across any hybrid combination; the Global Metadatabase indexes data across every storage vendor and cloud simultaneously — a single, continuously updated index of every file and object in the hybrid estate regardless of vendor, protocol, or location
- Storage agnosticism is not a feature, it is the architecture — the 2021 report raised vendor lock-in as a concern even then; Komprise works across storage platforms and all major cloud providers with no proprietary format dependency, no rehydration requirement, and no single-vendor constraint; this storage-agnostic architecture is what makes the metadata and orchestration layer possible at scale
The 2021 report identified governance and security as a top cloud priority — what were IT leaders worried about then, and how has the threat profile transformed?
Governance and security of data in the cloud ranked as the third-highest priority for cloud data management in 2021 at 41%, behind migrating data and cutting costs. The concern was prescient but the specific risks IT leaders had in mind in 2021 bear little resemblance to the governance challenges that define the category in 2026:
- In 2021, governance meant compliance — GDPR, HIPAA, and CCPA were the primary drivers; the fear was regulatory fines from improperly stored or accessed data; this concern remains valid and has not diminished
- By 2026, governance also means AI containment — the risk that sensitive unstructured data reaches public AI tools, surfaces in model outputs, or gets processed by unauthorized shadow AI applications did not exist as a practical concern in 2021; 90% of IT leaders are now extremely or somewhat worried about shadow AI from a privacy and security standpoint, and 44% report that sensitive data has already been leaked into AI tools
- The data estate that lacked governance in 2021 is now five years more exposed — unclassified, untagged unstructured data that was a compliance liability in 2021 is now also an AI risk liability; every file that went unclassified for five years is a file that could be ingested by an AI tool without any governance control (learn more about unstructured data classification and tagging)
- The governance gap is measurable in business outcomes — 13% of IT leaders report that negative AI outcomes have already resulted in financial, customer, or reputational damage; the governance concern the 2021 report placed third on the priority list has since risen to become the top business challenge for unstructured data management
- Komprise Sensitive Data Management, available in Komprise Intelligent Data Management, closes the gap the 2021 report identified — built-in PII and PHI detection, custom regex and keyword search, and KAPPA-powered extraction from proprietary file formats detect sensitive content across petabyte-scale data estates; every action is logged with complete audit trails for HIPAA, GDPR, and AI governance compliance; this is the capability the 2021 report’s 41% were asking for and that the intervening five years of AI adoption have made non-negotiable
The 2021 report found enterprises wanted analytics-driven data management above all else — five years on, has that investment delivered its promised returns?
Investing in analytics tools was the top priority for 45% of organizations in 2021, ranked higher than buying more cloud storage, buying more on-premises storage, or modernizing backup infrastructure. That preference for analytics over procurement was the most strategically important finding in the report. Whether organizations acted on it, and how, has since determined their AI readiness in 2026:
- The organizations that invested in analytics first are ahead on AI — analytics-first investment in 2021 meant building visibility into what data existed, where it lived, who used it, and what it cost; that visibility is exactly the foundation the Global Metadatabase provides and exactly what AI data preparation requires; organizations that built this layer in 2021 have a five-year head start on the classification and curation work that 56% of IT leaders now cite as their top AI challenge
- The organizations that bought storage instead are still buying storage — 85% of IT and storage leaders are projecting an increase in storage spend in 2026; the organizations still buying more storage as their primary response to data growth are repeating the approach the 2021 report already showed was less valued than analytics investment; the analytics-first instinct of 2021 has been validated every year since
- Analytics investment has evolved from reporting to orchestration — in 2021, analytics meant dashboards and cost modeling; in 2026, analytics means the Global Metadatabase continuously indexing billions of files, Deep Analytics querying that index for precise AI datasets, and Smart Data Workflows acting on query results automatically; the capability stack has grown from insight to action — Komprise is the metadata and orchestration layer for enterprise unstructured AI data, not just an analytics reporting tool
- KAPPA represents the frontier of what analytics-driven data management now means — the 2021 report could not have anticipated that metadata enrichment would become a critical AI differentiator; KAPPA Data Services, available in Komprise Intelligent Data Management, extend analytics beyond standard file attributes to domain-specific content extraction from DICOM, BAM, legal, and financial file formats using serverless processing at petabyte scale; this is where analytics investment in 2021 leads in 2026
- The 2026 report closes the loop — the 2021 survey established that enterprise IT wanted analytics-driven data management as its highest investment priority; the 2026 Komprise State of Unstructured Data Management shows what five years of that investment has built in the organizations that followed through, and what the cost of delay has been for those that did not; the latest report is available at komprise.com/report.