Data Management Glossary
Unstructured Data
What is Unstructured Data?
Unstructured data is data that doesn’t fit neatly in a traditional database and has no identifiable internal structure. This is the opposite of structured data, which is data stored in a database. Unstructured data does not follow a predefined data model or schema,
Up to 80% of business data is considered unstructured, with this number increasing year over year.
Examples of unstructured data are:
- Documents, presentations and user documents
- Chats and e-mail messages, Pphotos, audio and video files,
- CAD / CAM files,
- Genomics sequencing and medical images,
- IoT, machine-generated data and log files

What’s the difference between structured data and unstructured data?
Data can be of two broad types: structured data and unstructured data.
Structured data is data organized in categories by rows and columns in an Excel spreadsheet or a database. For example, accounting records are structured data because you can organize them by customer, by geography, by product, etc.
Structured data is typically stored in a database and can be queried using query languages such as Structured Query Language (SQL). Most data was predominantly structured until 2000 but since then we have seen an explosion of unstructured data. Today, structured data accounts for less than 20% of the world’s data.
Unstructured data usually does not include a predefined data model, and it does not match well with relational tables. Text heavy, unstructured data may include numbers and dates, as well as facts. This leads to difficulty in identifying this data using conventional software programs.
Unstructured data is typically stored across file systems (NAS) and object storage. Unlike structured data in databases, it is harder to search, analyze, and govern, which creates challenges for cost control, compliance, and AI initiatives.
What are Unstructured Data Types by Industry?
Unstructured data is the predominant data type that is generated by most applications today, from self-driving cars, to Internet of Things (IOT) devices, to genome sequencers, to video and audio files. Most of the data we generate and use today is unstructured. Here are some examples of unstructured data types industry:
- Life Sciences: Imaging, genome sequencing, research
- Healthcare: Imaging, PACS, digital pathology
- Media & Entertainment: Post-production, animation, VFX, content delivery
- Government: CAD/CAM, GIS, bodycam surveillance
- Oil and Gas: Seismic data, compliance
- Transportation: Autonomous vehicles
- Financial Services: Claims data, call center recordings
Read: Why Harnessing Unstructured Data is a Top Enterprise Mandate
Read: Getting an Upper Hand on the Unstructured Data Problem
Read: Why Unstructured Data Matters – An Industry View
Why is Unstructured Data Growing so Fast?
The analyst firm IDC predicts that we will generate over 175 zettabytes of data by 2025. Consider that one zettabyte is 4.4 Billion 1 terabyte drives. IDC also predicts that in the next three years we will generate more data than what we created over the past 30 years, and this growth trend will continue.
Most of the data we generate today is unstructured because unstructured data has several advantages over structured data:
- Wider Use Cases for Unstructured Data: Structured data has a rigid pre-defined structure and it can only be used for its intended purpose. This narrows the number of use cases for structured data – while it is useful for transactional applications like revenue tracking or catalogs, it is not a good use for applications that generate data that is not so easy to categorize such as video or genomics.
- Various Formats: Unstructured data can be stored in a variety of formats – from a mp4 video to a genomics BAM file to a .log diagnostics file to an X-RAY image that may be stored as a digital PACS format, all of these are types of unstructured data. So, an accurate way to describe unstructured data is that it has a variety of formats and not just one format. This means more applications can generate unstructured data and tailor the format to their use.
- Various Sizes: Unlike a cell in a database, unstructured data does not have to be a specific size or character limit. For example, you can have small video files for short snippets and large video files for full length movies. This also increases flexibility in how unstructured data is generated and used.
Since unstructured data is easier to create and use, more applications and users are working with unstructured data.
Unstructured Data Management
Managing growing volumes of unstructured data generated within an organization are leading to higher expenses.
What are the 3 Vs of unstructured data?
- Volume: The sheer quantity of data will continue to grow in a incomprehensible rate
- Velocity: The quantity of data is coming in at a continually faster rate
- Variety: The types of data continue to be more varied
These 3 Vs of unstructured data, originally defined by former Meta Group / Gartner industry analyst Doug Laney, means that managing unstructured data growth is critical for organizations as they find their budgets and resources are getting stretched to their limits.
Unstructured data management requires an understanding of what data is hot and actively used, and what data is cold and rarely accessed. In most enterprises, over 80% of unstructured data becomes cold within a year of creation yet it continues to be managed on the most expensive storage and it continues to consume expensive backup resources.
Analytics-driven data management of unstructured data can change this by identifying hot data and cold data across storage and managing hot data on expensive environments while offloading cold data to lower cost passive management.
Unstructured data management should be done without restricting access to the cold data, so users and applications continue to see and access the cold data exactly as before. To understand how Komprise enables enterprise IT organizations to analyze, move, and manage unstructured data and save costs on storageand backups, read the white paper: Komprise Intelligent Data Management Architecture Overview.

How does unstructured data relate to Komprise?
Komprise analyzes unstructured data across all storage environments, providing visibility into usage, cost, and value. It enables organizations to identify redundant, obsolete, and trivial (ROT) data and curate high-value datasets for AI and analytics.
What are the challenges of managing unstructured data?
Lack of visibility, high storage costs, data sprawl, and difficulty identifying valuable data.
What are Common Cloud Migration Challenges for Unstructured Data?
Migrating unstructured data to the cloud has grown in popularity to save data storage costs, consolidate data centers, modernize IT infrastructure and take advantage of cloud-based services such as AI, ML and analytics. But there are many challenges when it comes to unstructured data migrations to the cloud, including:
- A global enterprise typically has billions of predominantly small files, which have significant overhead, causing data transfers to be slow.
- Server message block (SMB) and NFS protocol workloads, which can be user data, electronic design automation (EDA) and other multimedia files or corporate shares, are problematic since the protocol requires many back-and-forth handshakes which increase traffic over the network. The SMB protocol in particular, is known to to have WAN transfer performance challenges.
- As a result, cloud migrations can take much more time than IT organizations anticipate if not done correctly.
- File protocols are sensitive to high-latency network connections, which are unavoidable in WAN migrations.
- Bandwidth is often limited or not always available, causing cloud NAS migration data transfers to become slow, unreliable and difficult to manage.

Why is unstructured data important for AI?
AI models depend on high-quality datasets, most of which come from unstructured sources like documents and images. Preparing this data for AI entails new data management strategies which create automated ways to index, segment, curate, tag and move unstructured data continuously to feed AI and ML tools. Learn more about unstructured data management and read the 2026 Komprise State of Unstructured Data Management Report.
What is Unstructured Data?
Unstructured data is information that doesn’t have a predefined data model or is not organized in a pre-defined manner. Unlike structured data, which is typically organized into tables and follows a specific schema, unstructured data lacks a clear and consistent structure.
This type of data is often text-heavy but can also include images, videos, audio, social media posts, emails, and other forms of content. 90% of all data generated in today’s digital age is unstructured. The sheer volume of unstructured data makes it challenging to manage and analyze using traditional methods.
What are Examples of Unstructured Data?
Examples of unstructured data include:
- Text Documents: Word documents, PDFs, emails, and other textual content.
- Multimedia Files: Images, videos, and audio files.
- Social Media Feeds: Posts, comments, and multimedia content from social media platforms.
- Web Pages: Content from websites, which may include text, images, and multimedia elements.
- Sensor Data: Data from sensors, such as those in IoT (Internet of Things) devices.
What is Unstructured Data Management?
Unstructured data management is the processes and strategies involved in handling, storing, organizing, and extracting value from unstructured data. Effective unstructured data management is crucial for organizations looking to harness the potential insights and value contained within diverse and voluminous datasets.
As technologies and best practices continue to evolve, managing unstructured data becomes an integral part of overall data management strategies. See the definition for Unstructured Data Management and download the State of Unstructured Data Management report.
Why does AI need Unstructured Data?
Success with AI depends upon harnessing this data and feeding the right data at the right time to AI platforms. This is difficult and costly not only because of its tremendous volume, but also because of how unstructured data is dispersed across data storage siloes in the enterprise.
Komprise delivers a Global File Index for granular search and tagging of data across silos. In addition, with Komprise Smart Data Workflows, you can create custom workflows to easily search, find, and tag the exact files you want across all your hybrid cloud storage and create a plan to move the right unstructured data to a data lake or AI tool. Komprise delivers a storage-agnostic, analytics-based unstructured data management platform that automates data workflows for AI.
