Get the Flash Stretch Assessment. Maximize Tiering to Offset Price Hikes. Learn How

Back

Unstructured Data

What is Unstructured Data?

Unstructured data is data that doesn’t fit neatly in a traditional database and has no identifiable internal structure. This is the opposite of structured data, which is data stored in a database. Unstructured data does not follow a predefined data model or schema,

Up to 80% of business data is considered unstructured, with this number increasing year over year.

Examples of unstructured data are:

  • Documents, presentations and user documents
  • Chats and e-mail messages, Pphotos, audio and video files
  • CAD / CAM files
  • Genomics sequencing and medical images (DICOM, BAM, FASTQ, etc.)
  • IoT, machine-generated data and log files

unstructured_data_growth

What’s the difference between structured data and unstructured data?

Data can be of two broad types: structured data and unstructured data.

Structured data is data organized in categories by rows and columns in an Excel spreadsheet or a database. For example, accounting records are structured data because you can organize them by customer, by geography, by product, etc.

Structured data is typically stored in a database and can be queried using query languages such as Structured Query Language (SQL). Most data was predominantly structured until 2000 but since then we have seen an explosion of unstructured data. Today, structured data accounts for less than 20% of the world’s data.

Unstructured data usually does not include a predefined data model, and it does not match well with relational tables. Text heavy, unstructured data may include numbers and dates, as well as facts. This leads to difficulty in identifying this data using conventional software programs.

Unstructured data is typically stored across file systems (NAS) and object storage. Unlike structured data in databases, it is harder to search, analyze, and govern, which creates challenges for cost control, compliance, and AI initiatives.

What are some Unstructured Data Types by Industry?

Unstructured data is the predominant data type that is generated by most applications today, from self-driving cars, to Internet of Things (IOT) devices, to genome sequencers, to video and audio files. Most of the data we generate and use today is unstructured. Here are some examples of unstructured data types industry:

Read: Why Harnessing Unstructured Data is a Top Enterprise Mandate

Read: Getting an Upper Hand on the Unstructured Data Problem

Read: Why Unstructured Data Matters – An Industry View

Why is Unstructured Data Growing so Fast?

The analyst firm IDC predicts that we will generate over 175 zettabytes of data by 2025. Consider that one zettabyte is 4.4 Billion 1 terabyte drives. IDC also predicts that in the next three years we will generate more data than what we created over the past 30 years, and this growth trend will continue.

Most of the data we generate today is unstructured because unstructured data has several advantages over structured data:

  • Wider Use Cases for Unstructured Data: Structured data has a rigid pre-defined structure and it can only be used for its intended purpose. This narrows the number of use cases for structured data – while it is useful for transactional applications like revenue tracking or catalogs, it is not a good use for applications that generate data that is not so easy to categorize such as video or genomics.
  • Various Formats: Unstructured data can be stored in a variety of formats – from a mp4 video to a genomics BAM file to a .log diagnostics file to an X-RAY image that may be stored as a digital PACS format, all of these are types of unstructured data. So, an accurate way to describe unstructured data is that it has a variety of formats and not just one format. This means more applications can generate unstructured data and tailor the format to their use.
  • Various Sizes: Unlike a cell in a database, unstructured data does not have to be a specific size or character limit. For example, you can have small video files for short snippets and large video files for full length movies. This also increases flexibility in how unstructured data is generated and used.

Since unstructured data is easier to create and use, more applications and users are working with unstructured data.

Unstructured Data Management


Managing growing volumes of unstructured data generated within an organization are leading to higher expenses.

What are the 3 Vs of unstructured data?

  • Volume: The sheer quantity of data will continue to grow in a incomprehensible rate
  • Velocity: The quantity of data is coming in at a continually faster rate
  • Variety: The types of data continue to be more varied

These 3 Vs of unstructured data, originally defined by former Meta Group / Gartner industry analyst Doug Laney, means that managing unstructured data growth is critical for organizations as they find their budgets and resources are getting stretched to their limits.

Unstructured data management requires an understanding of what data is hot and actively used, and what data is cold and rarely accessed. In most enterprises, over 80% of unstructured data becomes cold within a year of creation yet it continues to be managed on the most expensive storage and it continues to consume expensive backup resources.

Analytics-driven data management of unstructured data can change this by identifying hot data and cold data across storage and managing hot data on expensive environments while offloading cold data to lower cost passive management.

Unstructured data management should be done without restricting access to the cold data, so users and applications continue to see and access the cold data exactly as before. To understand how Komprise enables enterprise IT organizations to analyze, move, and manage unstructured data and save costs on storageand backups, read the white paper: Komprise Intelligent Data Management Architecture Overview.

komprise_unstructured_data_intelligence

How does unstructured data relate to Komprise?

Komprise analyzes unstructured data across all storage environments, providing visibility into usage, cost, and value. Komprise enables enterprise IT organizations to identify redundant, obsolete, and trivial (ROT) data and curate high-value datasets for AI and analytics.

What are the challenges of managing unstructured data?

Lack of visibility, high storage costs, data sprawl, and difficulty identifying valuable data.

unstructured-data-challenges

Why is unstructured data so expensive to store and manage for enterprises? Which industries struggle with it the most?

Unstructured data is expensive for three compounding reasons. First, it defaults to primary storage. Unlike structured data that is created in a database with defined storage rules, unstructured data is generated by users and applications and lands on network-attached storage environments by default, where it accumulates without automatic lifecycle management. Most enterprises find that 60-70% of their NAS data has not been accessed in over 90 days, yet it continues to occupy the same expensive flash-based primary storage as actively used files.

Second, everything attached to primary storage gets replicated. Backup, snapshot, replication, and disaster recovery costs scale with primary storage volume. Cold unstructured data that should have been tiered years ago is instead being backed up repeatedly at significant cost, often three to five times its primary storage cost annually in data protection overhead.

Third, storage media prices are rising sharply. Gartner is calling it Memflation. NAND flash prices are forecast to surge 234% in 2026 driven by AI data center demand consuming available supply, with no meaningful relief expected until late 2027. For enterprises storing petabytes of cold unstructured data on flash-based primary storage, the cost of inaction is now significantly higher than it was 12 months ago.
Source: Gartner semiconductor forecast, April 2026

The industries struggling most are those that generate the largest volumes of domain-specific unstructured data with long retention requirements.

  • Healthcare and life sciences generate petabytes of medical imaging, genomics sequencing data, digital pathology slides, and electronic lab notebooks. HIPAA and research compliance requirements mean data cannot simply be deleted, and the files are large, complex, and growing rapidly as AI-assisted diagnostics and genomics become standard practice.
  • Media and entertainment generate massive volumes of raw footage, VFX assets, and post-production files that can reach petabytes per project. Storage refresh cycles are frequent and expensive, and files must remain accessible for licensing, reuse, and compliance purposes long after a project closes.
  • Financial services accumulate decades of scanned documents, call center recordings, compliance records, and trading data. Regulatory retention requirements from FINRA, SEC, and SOX mean deletion is often not an option, and the data must remain discoverable for e-discovery and audit purposes.
  • Engineering, semiconductor, and oil and gas firms generate enormous CAD/CAM, simulation, and seismic datasets that are critical during active projects and largely inactive afterwards but still subject to IP protection and contractual retention obligations.
  • Higher education and research institutions generate genomics, imaging, and research datasets that grow continuously as new projects launch and prior research must be retained for reproducibility and grant compliance.

What strategies are IT teams taking to address the unstructured data problem?

The strategies leading enterprises are taking to address this fall into three categories.

  • The first strategy is intelligent tiering. Organizations are implementing automated lifecycle policies that move cold unstructured data to lower-cost cloud or object storage based on last accessed time and other data attributes, without disrupting user access. Komprise Transparent Move Technology moves cold data to any cloud or object destination in native format, with users accessing tiered data transparently via Dynamic Links. Enterprises consistently reclaim 70% or more of primary storage capacity this way, deferring or eliminating storage refresh purchases.
  • The second strategy is analytics-driven storage planning. Rather than buying storage reactively, leading organizations are using tools like the Komprise Flash Stretch Assessment to quantify exactly how much cold data they have, what it costs to store, and what the savings would be under different tiering scenarios before committing to new hardware. This turns a reactive capital expenditure conversation into a proactive cost optimization strategy.
  • The third strategy is AI data preparation as a lifecycle discipline rather than a one-time project. Organizations that are furthest ahead on AI are treating unstructured data curation, classification, and governance as a continuous operational practice driven by policy-based automation, not a manual exercise before each new AI project. Komprise Smart Data Workflows enable this by continuously identifying, tagging, governing, and delivering the right unstructured datasets to AI platforms automatically, so AI pipelines always operate on current, governed data rather than a one-time snapshot that degrades over time.

What are Common Cloud Migration Challenges for Unstructured Data?

Migrating unstructured data to the cloud has grown in popularity to save data storage costs, consolidate data centers, modernize IT infrastructure and take advantage of cloud-based services such as AI, ML and analytics. But there are many challenges when it comes to unstructured data migrations to the cloud, including:

  • A global enterprise typically has billions of predominantly small files, which have significant overhead, causing data transfers to be slow.
  • Server message block (SMB) and NFS protocol workloads, which can be user data, electronic design automation (EDA) and other multimedia files or corporate shares, are problematic since the protocol requires many back-and-forth handshakes which increase traffic over the network. The SMB protocol in particular, is known to to have WAN transfer performance challenges.
  • As a result, cloud migrations can take much more time than IT organizations anticipate if not done correctly.
  • File protocols are sensitive to high-latency network connections, which are unavoidable in WAN migrations.
  • Bandwidth is often limited or not always available, causing cloud NAS migration data transfers to become slow, unreliable and difficult to manage.
Komprise-Hypertransfer-Migration-White-Paper-SOCIAL-2-768x402
25 times faster unstructured data migrations with Hypertransfer

Why is unstructured data important for AI?

AI models depend on high-quality datasets, most of which come from unstructured sources like documents and images. Preparing this data for AI entails new data management strategies which create automated ways to index, segment, curate, tag and move unstructured data continuously to feed AI and ML tools. Learn more about unstructured data management and read the 2026 Komprise State of Unstructured Data Management Report.

What is Unstructured Data?

Unstructured data is information that doesn’t have a predefined data model or is not organized in a pre-defined manner. Unlike structured data, which is typically organized into tables and follows a specific schema, unstructured data lacks a clear and consistent structure.

This type of data is often text-heavy but can also include images, videos, audio, social media posts, emails, and other forms of content. 90% of all data generated in today’s digital age is unstructured. The sheer volume of unstructured data makes it challenging to manage and analyze using traditional methods.

What are Examples of Unstructured Data?

Examples of unstructured data include:

  • Text Documents: Word documents, PDFs, emails, and other textual content.
  • Multimedia Files: Images, videos, and audio files.
  • Social Media Feeds: Posts, comments, and multimedia content from social media platforms.
  • Web Pages: Content from websites, which may include text, images, and multimedia elements.
  • Sensor Data: Data from sensors, such as those in IoT (Internet of Things) devices.
What is Unstructured Data Management?

Unstructured data management is the processes and strategies involved in handling, storing, organizing, and extracting value from unstructured data. Effective unstructured data management is crucial for organizations looking to harness the potential insights and value contained within diverse and voluminous datasets.

As technologies and best practices continue to evolve, managing unstructured data becomes an integral part of overall data management strategies. See the definition for Unstructured Data Management and download the State of Unstructured Data Management report.

Why does AI need Unstructured Data?

Success with AI depends upon harnessing this data and feeding the right data at the right time to AI platforms. This is difficult and costly not only because of its tremendous volume, but also because of how unstructured data is dispersed across data storage siloes in the enterprise.

Komprise delivers a Global File Index for granular search and tagging of data across silos. In addition, with Komprise Smart Data Workflows, you can create custom workflows to easily search, find, and tag the exact files you want across all your hybrid cloud storage and create a plan to move the right unstructured data to a data lake or AI tool. Komprise delivers a storage-agnostic, analytics-based unstructured data management platform that automates data workflows for AI.

Want To Learn More?

Related Terms

Getting Started with Komprise: