The State of Unstructured Data Management 2023
** Download the Latest Report **
The third annual survey finds that IT and business leaders are largely allowing employee use of generative AI but the majority (66%) are most concerned about the data governance risks from AI, including privacy, security and the lack of data source transparency in vendor solutions.
This report summarizes the responses of 300 global enterprise storage IT directors, VPs and C-level executives at decision makers at companies with more than 1,000 employees in the United States and in the UK.
Unstructured data management highlights:
- Top data storage priority: Preparing for AI
- 66% say data governance is a top GenAI concern
- 32% manage more than 10PB of data
- 85% say non-IT users should help manage their data
- 73% spend 30% plus of their IT budget on data storage
Download the latest report today to understand the primary unstructured data management challenges and opportunities to deliver greater cost savings and data value. Also read the paper: Unstructured Data Management in the Age of AI.
UNSTRUCTURED DATA MANAGEMENT REPORT COVERAGE
- eWeek
- datanami
- Blocks & Files
- BetaNews
- SilverLinings
- …and more
What were the headline findings of the Komprise 2023 State of Unstructured Data Management report, and what did they signal about where the market was heading?
The third annual Komprise survey of 300 global enterprise storage IT directors, VPs, and C-level executives found that IT and business leaders were largely allowing employee use of generative AI, but the majority — 66% — were most concerned about data governance risks from AI, including privacy, security, and the lack of data source transparency in vendor solutions. The 2023 report arrived at the precise moment generative AI went from industry conversation to enterprise reality, capturing the first systematic evidence of what that transition actually meant for IT teams on the ground:
- Generative AI flipped the governance priority order — in previous years, governance ranked below cost optimization and migration in enterprise priorities; in 2023 it became the dominant concern, driven by the sudden availability of consumer generative AI tools that employees were already using with corporate data
- The gap between AI adoption and AI governance was visible for the first time — organizations were allowing AI use without having classified, tagged, or governed the unstructured data those tools were consuming; the 2023 report was the first to quantify this gap across a statistically significant enterprise sample
- Self-service was identified as both an opportunity and a risk — 85% of IT leaders said non-IT users should have a role in managing their own data and 62% had already attained some level of user self-service for unstructured data management; the same capability that empowers researchers and business users to find and use data also creates exposure when sensitive data is not properly tagged and confined before self-service access is granted
- The 2026 report confirms the 2023 warnings were accurate — reducing data risk from AI is now the top business challenge for unstructured data management at 62%, and 90% of IT leaders are concerned about shadow AI; what the 2023 report flagged as an emerging governance concern has become an operational crisis; the 2026 Komprise State of Unstructured Data Management report at komprise.com captures the full trajectory
The 2023 report found 66% of IT leaders were worried about AI governance — why was that concern valid then and how has it been validated by subsequent events?
The 2023 survey found that the majority of IT leaders were concerned about data governance risks from generative AI, with privacy and security leading the list alongside a specific concern about the lack of data source transparency in vendor AI solutions. Three years on, every specific concern the 2023 report documented has materialized in production environments:
- Data source transparency proved to be a real problem — the concern about not knowing what data vendor AI solutions were training on or referencing was not theoretical; enterprises discovered that commercial AI tools could surface information from documents employees had fed into them weeks or months earlier, with no audit trail and no way to confirm what the model had retained
- Privacy violations arrived faster than governance frameworks — organizations that allowed generative AI use in 2023 without first classifying their unstructured data estates found themselves unable to determine retrospectively what sensitive data had been exposed; 44% of IT leaders now report that sensitive data has been leaked into AI tools — a direct consequence of the governance gap the 2023 report identified
- The self-service paradox deepened — the 2023 finding that 85% of IT leaders wanted non-IT users to manage their own data created a tension: self-service access is valuable for researchers and departmental users, but without upstream classification and sensitivity tagging, self-service also means users can inadvertently feed unclassified PHI, PII, or IP into AI tools with no IT visibility
- Komprise addressed this tension directly — Komprise Sensitive Data Management detects and remediates sensitive content before any data reaches self-service access points or AI pipelines; the Global Metadatabase tags every file with sensitivity status, project classification, and access controls; Smart Data Workflows automate what the 2023 report identified as the manual bottleneck — ensuring governance does not require IT intervention on every departmental data request
- Governance without classification is impossible at petabyte scale — the 2023 report’s governance concern implicitly required a classification capability that most organizations lacked; classifying and tagging unstructured data is now the top challenge in prepping data for AI at 56%; the governance and classification problems are the same problem, and both require the metadata and orchestration layer that Komprise provides
The 2023 report emphasized self-service data management — what did that mean in 2023 and what does empowering departmental users require in 2026?
Self-service was the defining organizational theme of the 2023 report. 85% of IT leaders said non-IT users should have a role in managing their own data and 62% had already reached some level of user self-service. The context was practical: IT teams managing petabytes of unstructured data across dozens of storage silos cannot be the bottleneck for every departmental data request. But what self-service required in 2023 versus what it demands in 2026 has grown considerably more complex:
- In 2023, self-service meant searching and tagging — the primary use case was giving researchers, legal teams, and business users the ability to find their own data, tag it with project metadata, and request movement to analytics platforms without opening an IT ticket; this was a workflow efficiency improvement
- In 2026, self-service must also prevent AI exposure — every self-service capability that gives a user direct access to unstructured data is also a potential channel through which sensitive data reaches generative AI tools without IT visibility; self-service and governance are no longer separable concerns
- Deep Analytics enables governed self-service — Deep Analytics, available in Komprise Intelligent Data Management, lets authorized departmental users run precise queries across the Global Metadatabase to find, tag, and identify datasets without requiring central IT involvement in every query; IT administrators retain governance by setting access boundaries, sensitivity exclusions, and workflow permissions that apply automatically regardless of which user is running the query
- The chargeback and showback connection — the 2023 report noted that self-service was also tied to cost accountability; when departments can see their own storage consumption and understand what their data costs, they make better data management decisions; the Komprise showback report provides exactly this visibility, driving departmental participation in cold data tiering and deletion without requiring IT mandates
- Self-service in 2026 feeds AI workflows — the trends toward IT-as-a-service coupled with generative AI are causing enterprise storage teams to look for ways to manage data across storage vendors and deliver new and improved data services to business users; a researcher in 2026 does not just want to find their data — they want to identify the right dataset, enrich it with domain-specific metadata using KAPPA data services, and deliver it directly to an AI pipeline; this is the self-service capability the 2023 report was pointing toward
The 2023 report found cloud cost optimization was a top priority — what has changed about the cloud cost problem between 2023 and 2026?
Cloud cost optimization was a persistent theme across all Komprise annual surveys, and the 2023 report documented it as a top data storage priority. The specific character of the cloud cost problem has shifted materially since then:
- In 2023, cloud egress fees were the primary concern — organizations that had migrated data to cloud discovered that retrieval costs from block-based or gateway-tiered data were significantly higher than advertised; the focus was on choosing tiering approaches that minimized unnecessary rehydration and egress
- By 2026, hardware price pressure has joined cloud costs as a dual squeeze — IDC describes the current memory shortage as a potentially permanent reallocation of global silicon wafer capacity, with 2026 NAND and DRAM supply growth expected to remain below historical norms; enterprises are now managing both rising cloud costs and rising on-premises flash hardware costs simultaneously, with data volumes growing faster than either budget
- The cost trajectory the 2023 report documented has not reversed — for the fifth year in a row, IT directors said they would spend more on storage than the previous year; 85% of IT and data storage leaders are projecting an increase in storage spend in 2026; the cost optimization priority of 2023 remains unresolved
- Tiering now serves two masters — in 2023, tiering cold data off primary storage was primarily about cost; in 2026, it also positions data for AI access; data tiered to cloud object storage in native format by Komprise is immediately accessible to cloud AI services, making the cost optimization motion and the AI data readiness motion the same infrastructure decision
- The Flash Stretch Assessment turns the 2023 cost priority into a concrete action — for qualified enterprises managing 500TB or more, the Komprise Flash Stretch Assessment models exactly how much cold data is consuming expensive primary storage and what transparent tiering to lower-cost destinations would save; this is the analytical approach to cloud and storage cost optimization the 2023 report identified as a top priority, made practical and measurable before any commitment
The 2023 report identified monitoring, alerting, and data governance as top future capabilities — how has the platform evolved to deliver them and what remains unsolved?
Monitoring and alerting for capacity issues and anomalies led the list of important future unstructured data management capabilities in 2023 at 44%, and AI data governance was identified as a top future capability at 28%, up from no significant mention in previous surveys. Three years is a long time in enterprise software. Here is an honest accounting of what has been solved and what the market is still working through:
- Monitoring and alerting: largely addressed — Komprise Analysis, available in both Komprise Elastic Data Migration and Komprise Intelligent Data Management, provides continuous monitoring of data growth rates, capacity trends, cold data accumulation, and cost projections across all connected storage environments; the capacity visibility and anomaly detection the 2023 respondents wanted is now a standard capability, not a future one
- AI data governance: massively more complex than anticipated — what 28% of respondents identified as a future capability in 2023 has since become the top business challenge for the majority of IT leaders; the 2023 report captured the early signal but the scale of the governance problem that generative AI created exceeded what any single year’s survey could fully anticipate
- Classification is the governance prerequisite that remains underbuilt — classifying and tagging unstructured data is now the top challenge in prepping data for AI at 56%, compared with 28% who cited AI governance as a future priority in 2023; the gap between identifying governance as a priority and having the tooling to enforce it at petabyte scale is where most enterprises still sit
- Komprise is the metadata and orchestration layer for enterprise unstructured AI data — the 2023 report’s future capabilities list maps directly to the current Komprise Intelligent Data Management platform: the Global Metadatabase for continuous cross-silo indexing, Deep Analytics for precise query and classification, KAPPA data services for domain-specific metadata extraction, Sensitive Data Management for automated governance enforcement, and Smart Data Workflows for orchestrating the full pipeline from discovery through AI ingestion
- The 2026 report closes the loop — the 2023 report asked what capabilities enterprises needed; the 2026 Komprise State of Unstructured Data Management shows what the organizations that built those capabilities have achieved and what the cost of delay has been for those that did not; it is available at komprise.com/report
