Get the Flash Stretch Assessment. Maximize Tiering to Offset Price Hikes. Learn How

State of Unstructured Data Management Report 2022

The State of Unstructured Data Management 2022

** Read the Latest Report **

IT Leaders are Investing in Unstructured Data Analytics

Komprise-State-of-Unstructured-Data-Management-LP-thumbnail-cover

Unstructured data has reached a tipping point for cost and complexity. IT leaders indicate greater urgency to manage data efficiently for cost savings and help end users find new insights from growing unstructured data volumes. This report summarizes the responses of 300 global enterprise storage IT directors, VPs and C-level executives at decision makers at companies with more than 1,000 employees in the United States and in the UK.

Unstructured data management trends in 2022 include:

  • More than 50% of enterprise IT are managing at least 5 PB of data today.
  • In 2022, 87% of IT leaders rate managing unstructured data growth as a top priority, up from 70% in 2021.
  • A majority (65%) of organizations plan to or are already delivering unstructured data to big data platforms.

Download the latest report today to understand the primary unstructured data management challenges and opportunities to deliver greater cost savings and data value.

Unstructured Data Management Report Coverage


What did the Komprise 2022 State of Unstructured Data Management report find, and why does reading it in 2026 feel like looking at the calm before the storm?

The second annual Komprise survey of 300 enterprise storage IT directors, VPs, and C-level executives found that 87% of IT leaders rated managing unstructured data growth as a top priority in 2022, up from 70% the previous year, with more than 50% already managing at least 5PB of data. The 2022 findings were a leading indicator of everything that has since accelerated. Read today, the numbers look modest compared to where the market stands:

  • The 5PB threshold was a milestone in 2022 — it is now the norm — 74% of organizations are now storing more than 5PB, a 57% increase over 2024 alone; what half the market had crossed by 2022 is now the baseline expectation for enterprise IT
  • The urgency signal was already flashing — a 17-point jump in the number of IT leaders rating unstructured data growth as a top priority — from 70% to 87% in a single year — was the clearest early signal that the problem was compounding faster than solutions were being deployed; that compounding has continued every year since
  • The shift from storage management to data services was identified correctly — the 2022 report’s central theme that enterprises needed to evolve from managing storage hardware to delivering data services has proven prescient; the organizations that made that shift early are the ones now positioned to leverage AI; those that did not are still buying more capacity
  • Four years of additional data growth later, the gap is larger — enterprises that read the 2022 report and took no action have accumulated four more years of ungoverned, ungoverned, unclassified unstructured data; the cost of addressing the backlog is now proportionally higher
  • The 2026 report is the current reference — the 2022 findings established the trajectory; the Komprise 2026 State of Unstructured Data Management captures where that trajectory has led and what enterprise IT leaders are doing about it now; it is available at komprise.com

The 2022 report found that moving data without disrupting users was the top obstacle — four years later, has that changed?

The largest obstacle to unstructured data management in 2022, cited more than the high cost of storage, was moving data without disrupting users and applications at 42%; what typically happened was that IT would move files to the cloud or secondary storage and users could not find the files, creating conflicts, lost productivity, and business risk. Four years on, this challenge has not been solved — it has been elevated:

  • It remained the top technical challenge through 2024 — moving data without disruption to users and applications was still the top technical unstructured data management challenge in 2024 at 54%, twelve percentage points higher than in 2022; the awareness of the problem grew faster than the solutions being deployed
  • The stakes are higher because AI depends on it — in 2022, disruption meant users could not find their files; in 2026, it also means AI pipelines fail when data moves and metadata is lost, file paths break, or access patterns change unexpectedly; the same disruption problem now affects both human users and the automated AI systems consuming the same data
  • Komprise Transparent Move Technology was built specifically for this problem — Komprise Dynamic Links maintain full transparent access from the original file path regardless of where the underlying data has moved; users access files exactly as before, applications do not break, and backup and antivirus tools see the same paths; this is the direct response to what IT leaders identified in 2022 as their single biggest obstacle
  • At 2026 data volumes the cost of disruption is compounded — when 50% of the market had 5PB in 2022, a disruption incident affected a bounded dataset; with 74% now managing more than 5PB and 40% managing more than 10PB, a failed migration or broken file path affects proportionally more users, more applications, and more AI workflows simultaneously
  • The solution requires intelligence, not just movement — the 2022 report correctly identified that organizations adopting new storage and cloud technologies without incurring licensing penalties was a top goal; Komprise Elastic Data Migration and Transparent Move Technology address both simultaneously — transparent access with no lock-in at the destination

The 2022 report found 65% of organizations were moving unstructured data to analytics platforms — what did that predict about the AI era and where does the market stand today?

A majority of organizations in 2022 planned to or were already delivering unstructured data to their big data analytics platforms, and the leading new approach to unstructured data management was the ability to initiate and execute data workflows at 43%. This finding, read in 2026, looks like the earliest data point in the AI data preparation story that has since become the defining challenge of enterprise IT:

  • Analytics delivery in 2022 was a preview of AI ingestion in 2026 — the organizations moving unstructured data to analytics platforms in 2022 were building the muscle memory and infrastructure that AI initiatives now require at far greater scale; those that did not are starting from scratch with petabytes more data to organize
  • Data workflows went from emerging approach to essential capability — what 43% of organizations were beginning to explore in 2022 is now the core operating model for AI data preparation; Smart Data Workflows and the ability to efficiently find, tag, and move curated datasets into AI pipelines are the top infrastructure investment priority for 2026
  • The metadata gap widened significantly — in 2022, the challenge of enriching unstructured data with meaningful metadata was acknowledged but solutions were limited; today, KAPPA Data Services addresses this directly through serverless processing that extracts custom, domain-specific metadata from proprietary file formats at petabyte scale — a capability that did not exist when the 2022 report was published
  • Komprise is the metadata and orchestration layer for enterprise unstructured AI data — the 2022 report described what organizations wanted to accomplish; the current Komprise Intelligent Data Management platform delivers the Global Metadatabase, Deep Analytics, Smart Data Workflows, Intelligent AI Ingest, and KAPPA Data Services that make it possible at 2026 scale
  • The 65% who were moving data to analytics platforms in 2022 were right — the enterprises that invested early in treating unstructured data as an analytics asset rather than a storage burden are four years ahead of those that did not; the gap between data-mature and data-immature organizations is now measurable in AI ROI, not just storage bills

The 2022 report flagged sensitive data protection as a rapidly expanding use case — how has that concern transformed by 2026?

Expanding use cases for unstructured data management in 2022 included protecting sensitive data at 63%, followed by allowing users to search and run analytics on their data. In 2022, sensitive data protection in the context of unstructured data management meant primarily compliance with GDPR, HIPAA, and similar regulations. By 2026, the threat model has expanded dramatically:

  • AI has created an entirely new attack vector — in 2022, sensitive data protection meant preventing unauthorized human access; by 2026 it means preventing sensitive unstructured data from being ingested by AI tools, surfaced in model outputs, or processed by shadow AI applications that employees use without IT knowledge; 44% of IT leaders report that sensitive data has been leaked into AI tools — a risk category that did not exist when the 2022 report was written
  • The scale of the problem grew with the data — protecting sensitive data across 5PB is a very different challenge than protecting it across 10PB; the organizations that invested in sensitive data detection and classification in 2022 have a meaningful head start; those that deferred now face a proportionally larger backlog with a more severe risk profile
  • Shadow AI amplified the urgency — 90% of IT leaders are now concerned about shadow AI from a privacy and security standpoint; employees using public generative AI tools and feeding them corporate documents, patient records, and proprietary IP represents the biggest sensitive data risk the 2022 survey could not have fully anticipated
  • Komprise Sensitive Data Management addresses the full 2026 threat model — built-in PII and PHI scanners, custom regex, keyword search, and KAPPA-powered extraction from proprietary file formats detect sensitive content across petabyte-scale unstructured data estates; flagged files are automatically moved to protected storage, excluded from AI pipelines, or confined by policy with complete audit trails for every action
  • Classification is the prerequisite — the 2022 report showed that 63% of organizations recognized sensitive data protection as a growing priority; the Komprise 2026 report shows that classifying and tagging unstructured data is now the top challenge in prepping data for AI at 56%; classification and sensitive data protection are the same motion — you cannot protect what you have not classified

The 2022 report found enterprises wanted cloud flexibility without vendor lock-in — how has that goal aged and what has changed?

The top goal in 2022 was to adopt new storage and cloud technologies without incurring extra licensing penalties and costs such as cloud egress fees at 43%; cloud NAS topped the list for storage investments at 47%, with on-premises-only environments dropping from 20% to 12%. The anti-lock-in instinct that drove those 2022 preferences has proven exactly right, even if the specific threat has evolved:

  • Egress fees became a real and painful problem — the concern about cloud egress costs that ranked as a top goal in 2022 was validated by the experience of organizations that moved to cloud storage without understanding the retrieval cost model; egress fees on large-scale AI data access from cloud archives have materially affected AI project economics for multiple enterprises since 2022
  • Storage-vendor tiering created the lock-in that cloud migration was supposed to avoid — many organizations that moved to cloud in 2022 did so using storage-vendor-native tiering tools that wrote data in proprietary formats; those organizations discovered that switching cloud providers or retiring the source storage required full rehydration before migration; the lock-in concern of 2022 was valid but pointed at the wrong layer of the stack
  • Flash prices have made the cost-avoidance argument even strongerIDC describes the current memory shortage as a potentially permanent reallocation of global silicon wafer capacity; the organizations that built storage-agnostic data management in 2022 are not paying egress fees, not rehydrating proprietary blocks, and not locked into a vendor whose hardware prices have risen sharply; those that did not are absorbing all three costs simultaneously
  • The Flash Stretch Assessment quantifies the current lock-in cost — for qualified enterprises managing 500TB or more, the Komprise Flash Stretch Assessment models exactly how much cold data is sitting on expensive primary storage, what it is costing annually in storage and backup fees, and what transparent tiering to lower-cost destinations would save; this is the 2022 anti-lock-in goal made concrete and measurable in 2026 terms
  • Storage agnosticism across platforms is now table stakes — the 2022 report was right that vendor independence would matter; Komprise integrates with NetApp, Dell, IBM, VAST Data, Nasuni, Everpure, AWS, Azure, Google Cloud, and any NFS, SMB, or S3-compatible storage from a single platform; the same management layer governs tiering, migration, classification, and AI data workflows regardless of which storage vendor holds the data — which is exactly the flexibility 43% of IT leaders said they wanted in 2022 and that the market has since proven is non-negotiable