Why Unstructured Data Management Matters: An Industry View

Industryseries_438931535-2048x933The world of data is ever-changing in terms of its types, volumes, uses and risks. Understanding these differences is critical so that IT leaders, data analysts, data scientists and other data stakeholders can manage and use it effectively for new initiatives. The first distinction is unstructured data versus structured data. Structured data is presented in rows and columns, and typically stored in a database.

Trends-2024_-Unstructured-Data-Blog_Website-Featured-Image_1200px-x-600pxStructured and semi-structured data can include transaction data, customer relationship data, back-office data, point of sale details, financial and claims data, or click-stream data from a website that is typically fed into a data warehouse and accessed, analyzed and shared via reporting and business intelligence tools. Unstructured data, which comprises at least 80% of all data in the world, does not follow a standard, identifiable structure. IT teams can’t easily store it in a relational database. And it is growing exponentially.

There is an estimated 120 ZB of data in the world today, according to Statista. IDC expects data to grow to 175 ZB by 2025. To consider what that means, this Cisco blog gives a few analogies:

“If each Terabyte in a Zettabyte were a kilometer, it would be equivalent to 1,300 round trips to the moon and back.”

Unstructured data can include emails, documents, web files, audio and video files, genomics files, CAD files, images and instrument and research data. While unstructured data is harder to manage and costly to store, it is fueling the next generation of AI and ML technologies which are reshaping society as we know it.

Unstructured Versus Structured Data

This data is growing particularly fast in certain industries. Here are some examples of common unstructured data types:

In this series, we look at several industry examples of unstructured data types, growth and data management challenges as well as the potential value this data can bring to its sector.

The first post focuses on the Life Sciences sector, an $8-10 billion global industry with leaders including Eli Lilly, Pfizer, Johnson & Johnson, Merck and Abbvie.

Here’s a teaser:
Pharma and biotechs have been at the center of global innovation, with rampant revenue growth and demand fueled by the Covid-19 pandemic. Investments in cloud, AI and digital technologies have intensified, delivering groundbreaking changes in how companies develop, test and deliver products to market.

Common file types in life sciences include: clinical images, genome sequencing and other instrument data, as well as research documents. These data types don’t work well with traditional data analytics tools; life sciences companies are increasingly moving research data to the cloud to leverage affordable and scalable processing and analytics services for research data, offered by the large cloud providers (CSPs).

Here is a Pfizer case study on cold data tiering to AWS.


In this unstructured data management by industry series we’ll cover:

Common data management challenges, including data silos, poor visibility into data, cost optimization needs, continual change in regulations, and too much time spent on data preparation and deployment. The AI opportunity for life sciences, including critical data risks to manage. How unstructured data management helps.

Getting Started with Komprise:

Contact | Data Assessment