Why Unstructured Data Management Matters

komprise_state-unstructured-data-management-2023This blog has been adapted from the original version on FastCompany.

A survey conducted by Komprise in 2023 found that 32% of organizations are managing 10PB of data or more. That equates to 110,000 ultra-high-definition (UHD) movies, or half of the data stored by the U.S. Library of Congress. As well, 73% of organizations are spending more than 30% of their IT budget on data storage.

Now with AI, big data analytics, and digital processes dominating business strategies, we need to better leverage all this data. Unstructured data is the fuel needed for AI, yet most organizations aren’t using it well. One reason for this is that unstructured data is difficult to find, search across, and move due to its size and distribution across hybrid cloud environments.

Enterprises have two main objectives in managing unstructured data: (1) the ability to quickly find, sort, and leverage it for AI projects and (2) control rapidly growing storage and backup costs.

The benefits of unstructured data management for cost control and AI

Unstructured data management solutions and strategies can help IT gain holistic visibility and a detailed understanding of enterprise data:

  • How much data is stored and where,
  • What types and sizes of files are most prominent,
  • What are the costs to store and back it up,
  • Who are the top owners,
  • And other identifying characteristics such as metadata describing file contents.

With this information, organizations can choose the optimal, most cost-effective storage for different data sets while also developing workflows to help departmental users find their data and move it to AI platforms as needed.

Industry examples

Unstructured-Data-Matters_-An-Industry-View-Blog_Resource_Thumbnail_800x533Let’s start with healthcare.  Roughly 30% of the world’s data volume is generated by the healthcare industry, and this will grow to 36% by 2025, according to research compiled by RBC Capital Markets. Clinical notes and records, medical images, digital pathology, and research studies are valuable sources of information to better inform personalized medicine and improve patient outcomes.

AI is starting to enable more accurate, faster analysis of common scans, such as mammograms and colonoscopies and can help clinicians create holistic care plans through intelligent analysis of demographic and social data from patients with a particular condition. Generative AI solutions have been reported to reduce the paperwork burden of clinicians and improve communications between physicians and their patients.

Healthcare organizations need to analyze and manage the complexity of data and file types while ensuring tight adherence to regulations governing its use and protection. Instilling the right policies and tools to analyze, discover, protect, and safely move data to the right locations where it can be anonymized and cleansed prior to analysis is a key strategy.

The auto industry is another sector navigating technology disruption. It’s hard to drive down the road for more than a few minutes without seeing an electric vehicle, whereas two years ago they were still a rare sight. Electric and autonomous vehicles collect large quantities of sensor data, which helps the car adjust and take actions on the fly or issue alerts to the driver. The collection and analysis of this data is white gold for manufacturers to troubleshoot issues and improve their designs.

Using an unstructured data management system, a car manufacturer could create a workflow like this:

  • Find crash test data related to the abrupt stopping of a specific vehicle model;
  • Use an AI tool to identify and tag data with “Reason = Abrupt Stop”;
  • Move only the related data to a cloud data lakehouse to reduce the time and cost associated with moving and analyzing unrelated data;
  • Move the unrelated data to an archival storage tier for cost savings (or delete it) once the analysis is complete;
  • Imagine the implications for any manufacturer wishing to leverage the right machine data to avoid bad outcomes for its customers and to improve products faster than its competitors.

Compliance & Regulations

From industry regulations governing sensitive data to geolocation requirements responding to e-discovery requests, preventing ransomware, and managing data during an M&A or divestiture, the list of data compliance needs keeps growing. Holistic data governance is harder to achieve all the time given the volume of data, the prevalence of shadow IT, and the wide distribution of data.

Consider data management solutions that support automated workflows for compliance:

  • For example, a user could create a query to find all data related to a divestiture project and then, through an API, use an external application like Amazon Macie to identify PII data and tag it.​
  • Next, the system could automatically move the PII data to an object-locked cloud storage service where it cannot be modified or accessed.

Growing assets of unstructured data can be both a gift and a curse. Enterprises of all sizes are dealing with the strain on budget and time to store, manage, and govern it all. Yet with intelligent automation, sound policies, and collaboration among top data stakeholders across the organization, IT teams can properly manage the data and leverage it for game-changing AI and analytics initiatives.


Getting Started with Komprise:

Contact | Data Assessment