The theme for unstructured data management in 2026 is more of everything: more data, more investment, more pains and more AI security and risk concerns.
Forty percent of enterprises are storing more than 10PB, according to our latest industry survey, which was conducted in October 2025 and summarizes responses of IT leaders (director and above) at enterprises with more than 1,000 employees in the United States.
Consider that 10PB is the equivalent of two trillion songs or 10 trillion books: it’s difficult to put your arms around data volumes of this size. Yet more data, the majority of which is unstructured and scattered across disparate data storage siloes, also means more potential opportunity for competitive advantage, strategic AI initiatives and more.
How enterprise IT leaders will manage this sprawling data estate moving forward is a question we strove to answer in the fifth-annual Komprise 2026 State of Unstructured Data Management. Below, we asked Komprise COO and cofounder Krishna Subramanian to distill the finer points of the survey.
What are the top three findings from the 2026 survey?
KS: First, data is growing a lot faster than it has in the past. Most enterprises (74%) are storing more than 5PB of unstructured data, a 57% increase over 2024. That’s equivalent to 5 trillion books being stored!
Secondly, despite many cost-efficient classes of storage available today from on-premises to the cloud, nearly all organizations (85%) will spend more on data storage and backups in 2026, versus 59% in the 2024 survey. Data, risks and costs are growing so fast that clearly new methods of data management are needed in the AI era to address these challenges while discovering new value from unstructured data.
Third, this unstructured data is key for AI, but it has largely not been filtered or prepared. This is why our survey reveals that IT leaders are embracing unstructured data classification as a top tactic for data security, governance, and AI ROI.
Why do you think data volumes have grown so much in the past year, compared to previous years?
KS: We speak to our customers regularly about cost-effective data management strategies. They tell us that growth is driven by accelerated AI adoption, exploding digital exhaust, and massive increases in rich media and sensor data. For instance:
- AI workloads produce large training sets, embeddings, logs, and model artifacts;
- Application modernization generates more machine logs, telemetry, and event data;
- Proliferation of rich media (video, imaging, design files) across industries.
- Regulatory retention rules increase how long data must be stored.
- Finally, unmanaged cloud sprawl creates duplicate copies, snapshots, and backups. More copies are being generated for AI and costs can spiral quickly.
How do IT teams need to shift their thinking and their practices for unstructured data management in light of rampant data growth?
KS: The explosive growth of unstructured data is now reaching a tipping point due to a perfect storm of multiple factors: massive unstructured data growth across silos, spiraling security risks from this data, and the need to organize and prepare this data for AI. Komprise delivers unstructured data management that analyzes and builds a global metadatabase of all your unstructured data. It then mobilizes just the right data for AI and for optimizing infrastructure.
IT organizations that do not take time to assess their data estates, growth rates, business priorities, security posture and IT resources may quickly discover that the pains will outweigh the gains.
Regardless of the storage and backup systems that you use and the needs of your stakeholders, it’s critical to independently and holistically understand your data ecosystem. You also need ways to efficiently classify and move data as needed, without penalty. This requires a flexible, storage-agnostic data architecture.
Unstructured data classification was both a top strategy and a top challenge. Why is this so difficult for organizations?
KS: File and object data grows fast, sprawls across platforms, and lacks consistent metadata. Even with many IT and AI tools, most solutions were never designed to classify petabyte-scale, distributed, heterogeneous data. A few reasons why unstructured data classification is so difficult include:
- No unified visibility: Data lives across NAS, cloud object stores, SaaS apps, backups, and archives. Most tools only see one system at a time.
- Inconsistent or missing metadata: Unstructured files rarely contain rich metadata, which means classification requires deep content inspection at scale.
- Tool fragmentation: Storage tools classify only data in their own platform, which impedes holistic, accurate visibility across the enterprise.
- Massive scale: Petabyte-level and exabyte-level data volumes make manual, siloed approaches impossible.
A top AI challenge for IT infrastructure teams is handling data governance and security concerns. What lies beneath this struggle?
KS: IT struggles are likely due to incomplete policies which are hard to enforce along with the reality that sensitive and proprietary data is too often copied and/or moved to unprotected locations where it can be discovered and fed to AI. Our survey found that organizations are largely not restricting AI tools and usage. In this environment, things can get out of control quickly. Even with robust security and access management tools, sensitive data is misplaced and hidden in innocuous documents like meeting notes and internal slide decks. It’s easy for employees to load files into AI for analysis, not realizing that IP or customer data is within the file. We need new types of tools, checks and balances to police security and governance, when AI is happening everywhere and at the speed of light.
Given the economy and the fact that AI is now being seen as a job eliminator in certain functions including IT roles, were you surprised that only 25% plan on downsizing IT staff in 2026?
KS: This is great news for the job market. Individuals with AI technology expertise, from developing, implementing and evaluating new technologies to creating the right infrastructure and the right policies and controls, are in high demand. Entry-level employees, such as junior developers and system administrators, may struggle the most in this market but if they can quickly adapt to using AI on the job, they can evolve. We also know that just like every other business function, leaders such as the CIO need to be extremely AI savvy to guide their organizations into this unknown territory. So while it is expected that there will be some impact of AI automation on IT, it’s likely minimal in the long term. We don’t know yet what new careers AI will launch!
Bottom line: IT teams desperately need people who are ready to deploy and manage data, technology and emerging requirements in the age of AI.
AI data management skills are most important right now, according to the survey. What are these skills and how can IT people get them quickly?
KS: AI data management skills are becoming essential as enterprises race to make unstructured data usable for analytics and AI. The most important skills center on understanding how to discover, classify, govern, and mobilize massive volumes of file and object data across silos. IT teams need proficiency in metadata analysis, data lifecycle automation, policy-based tiering and integrating storage systems with AI pipelines. Just as critical are skills in evaluating data quality, enforcing governance (PII, access patterns, retention), and preparing data for AI ingestion. The fastest way for IT professionals to build these capabilities is through hands-on experience with data management platforms, vendor training, cloud-provider courses on data and AI services, and targeted certifications focused on unstructured data management, governance, and modern storage architectures.
Any other takeaways from the survey findings this year? Or advice for IT and data leaders in 2026?
KS: I think the top 5 takeways from the survey are worth highlighting:
- Data growth has hit a new peak and IT leaders cannot afford to ignore it.
- Data classification is an essential strategy for bringing structure to unstructured data.
- GenAI data security concerns persist, yet only 14% of organizations are restricting AI in their workforce. Is disaster looming?
- It looks like IT budgets will flex for AI in 2026.
- The role of Chief AI Infrastructure Officer will emerge.
So buckle up and think more broadly about unstructured data management next year.
