Data Management Glossary
Zettabyte
What is a zettabyte?
A zettabyte is a measure used to describe a computer or other device’s data storage capacity which equals a thousand exabytes, a billion terabytes, or a trillion gigabytes.
One zettabyte is the equivalent of 250 billion DVDs worth of data. In 2020, IDC reported that 59 zettabytes of data would be consumed over the course of the year, the majority of which data is unstructured data.
According to IDC, the global datasphere reached approximately 149 zettabytes in 2024 and is forecast to reach 394 zettabytes by 2028, driven by AI, machine learning, and cloud infrastructure growth, with the majority of that data unstructured.
Source: IDC Global DataSphere Forecast via Rivery Data Statistics 2026
The 2026 Komprise State of Unstructured Data Management Report found that 40% of enterprise IT organizations are now managing 10 petabytes or more of unstructured data, reflecting the accelerating scale of enterprise data estates.
Source: Komprise State of Unstructured Data Management Report 2026
Zettabyte FAQs
How much data does the world store today and how is it growing?
The global datasphere, the total amount of data created, captured, copied, and consumed worldwide, reached approximately 120 zettabytes in 2023 and is projected to approach 200 zettabytes by 2026 based on IDC research. IDC’s earlier projection of 175 zettabytes by 2025 has proven to be conservative as AI infrastructure buildouts, IoT device proliferation, and genomics sequencing have all accelerated data creation rates beyond earlier forecasts.
The majority of this data is unstructured. Documents, images, video, audio, medical imaging, research data, and machine-generated logs now represent 80-90% of all enterprise data according to Gartner, and unstructured data is growing at 55-65% annually. For enterprise IT teams, the practical consequence is that storage environments must accommodate not just more data but faster-growing, more complex, and more valuable data that requires active management rather than passive accumulation.
Source: IDC Global DataSphere forecast
Source: Gartner research on unstructured data growth
Source: Komprise State of Unstructured Data Management reports
What does zettabyte-scale data growth mean for enterprise storage costs in 2026?
Zettabyte-scale data growth is not an abstract future concept. It is the operational reality that enterprise storage budgets are responding to right now, and in 2026 two forces are making it significantly more expensive than in previous years.
The first is Memflation. Gartner is calling the current memory price surge Memflation, with NAND flash prices forecast to increase 234% in 2026 driven by AI data center demand consuming available semiconductor supply. No meaningful relief is expected until late 2027. For organizations storing unstructured data on flash-based primary NAS environments, this means the cost per terabyte of primary storage is dramatically higher than it was 12 months ago.
The second is that 60-70% of enterprise NAS data has not been accessed in over 90 days according to Komprise customer research, yet it occupies the same expensive primary storage as actively used data. At zettabyte-scale growth rates, this misalignment between data activity and storage tier compounds every quarter. Organizations that do not actively manage data placement are paying Memflation-era prices to store data that could be on cloud object storage at a fraction of the cost.
Source: Gartner Memflation forecast April 2026
Why is intelligent tiering the most effective response to zettabyte-scale data growth?
As enterprise data estates grow toward and beyond zettabyte scale, the economics of keeping all data on the same tier of storage become increasingly untenable. High-performance flash-based primary NAS is designed for active workloads that need fast, low-latency access. Keeping cold unstructured data on the same infrastructure is the equivalent of storing archived documents in a trading floor office rather than an offsite records facility: the space is far more expensive than the use case justifies.
Intelligent tiering addresses this by automatically identifying cold and inactive data and moving it to lower-cost cloud or object storage based on policy, typically last accessed time. Komprise Intelligent Tiering does this across any NAS environment without agents or changes to infrastructure. Tiered data is moved via Transparent Move Technology in native file format, and users access it transparently from its original path via Dynamic Links with no rehydration required. As enterprise data estates grow, intelligent tiering scales with them, continuously right-placing new cold data rather than requiring periodic manual cleanup projects. Komprise customers consistently reclaim 70% or more of primary NAS capacity through intelligent tiering, and the Komprise Flash Stretch Assessment quantifies the specific savings opportunity in each customer’s environment before any action is taken.
Source: Komprise Flash Stretch Assessment
How does data lifecycle management help organizations manage zettabyte-scale unstructured data?
Intelligent tiering handles cold data. Data lifecycle management handles the full continuum from data creation to retirement, and at zettabyte scale the distinction matters. Without systematic lifecycle management, data accumulates indefinitely, cold data fills primary storage, ROT data consumes backup and replication budgets, and organizations eventually face storage refresh decisions driven by volume rather than by value.
Komprise Intelligent Data Management automates the full unstructured data lifecycle. Komprise scans continuously across all NAS and cloud environments, building an always-current inventory in the Global Metadatabase. Policy-based tiering moves cold data automatically. Deep Analytics precision queries identify specific datasets for lifecycle actions based on any combination of metadata and custom tag criteria. Smart Data Workflows automate classification, governance, sensitive data detection, and AI data curation on a continuous schedule. The result is that as data volumes approach and exceed zettabyte scale at the enterprise level, the cost and complexity of managing that data does not grow proportionally, because lifecycle management policies run automatically rather than requiring manual intervention at each scale milestone.
According to the 2026 Komprise State of Unstructured Data Management Report, 40% of organizations now manage 10 petabytes or more of unstructured data, a significant increase from prior years. For these organizations, the move from reactive storage management to proactive data lifecycle management is the defining difference between storage costs that scale linearly with data growth and storage costs that remain controlled regardless of how fast data accumulates.
How does zettabyte-scale data growth affect enterprise AI programs?
Enterprise AI programs depend on access to large volumes of high-quality, well-classified unstructured data. As global data volumes approach 200 zettabytes, the availability of raw data is no longer the constraint on AI programs. The constraint is the ability to find the right data within a vast, ungoverned, and largely cold estate, curate it precisely, and deliver it to AI systems in a governed, noise-free form.
At zettabyte scale, the gap between raw data volume and AI-ready data volume is enormous. Most enterprise unstructured data estates contain large volumes of cold archives, duplicate files, ROT data, and ungoverned sensitive content that would degrade AI model accuracy if ingested without filtering. The organizations that win in AI are not those with the most data, but those with the best-governed, most precisely curated data pipelines.
Komprise addresses this by connecting data lifecycle management directly to AI data preparation. Intelligent Tiering keeps high-performance primary storage focused on active and AI-relevant workloads. Deep Analytics queries the Global Metadatabase to find precisely the right datasets for specific AI use cases. Smart Data Workflows automate the curation, governance, sensitive data exclusion, and delivery of AI-ready datasets on a continuous schedule. As data volumes grow, these workflows scale with them, ensuring that AI pipelines always receive current, governed data rather than being overwhelmed by the noise that accumulates at zettabyte scale.