Data Management Glossary
Unstructured Data Management
What is Unstructured Data Management?
Unstructured data management is a category of software that has emerged to address the explosive growth of unstructured data in the enterprise and the modern reality of hybrid cloud storage. In the Komprise 2023 Komprise State of Unstructured Data Management, 32% of organizations report that they are managing 10PB of data or more. That equates to 110,000 ultra-high-definition (UHD) movies, or half of the data stored by the U.S. Library of Congress. Most (73%) of organizations are spending more than 30% of their IT budget on data storage.
Data storage and data backup technology vendors are now recognizing the importance of unstructured data management as data outlives infrastructure and as data mobility is needed to leverage cloud data storage.
There are 5 requirements for unstructured data management solutions:
- Goes Beyond Storage Efficiency
- Must be Multi-Directional
- Doesn’t Disrupt Users and Workflows
- Should Create New Uses for Your Data
- Puts Your Data First and Avoids Vendor Lock-In
An analytics-based unstructured data management solution brings value by analyzing all data in storage across on-premises and cloud environments to deliver deep insights. This knowledge helps IT managers make great decisions with users in mind, optimize costs and reduce security and regulatory compliance risks. These insights go beyond traditional storage metrics such as latency, IOPS and network throughput.
Here are some of the new metrics made possible with data management software:
- Top data owners/users: See trends in usage and and possible compliance issues, such as individual users storing excessive video files or PII files being stored in an insecure location.
- Common file types: The ability to see data by file extension eases the process of finding all files related to a project and can inform future research initiatives. This could be as simple as finding all the log files, trace files or extracts from a given application or instrument and moving them to a data lake for analysis.
- Storage costs for chargeback or showback: Whether for chargeback requirements or not, stakeholders should understand costs in their department and be able to view metrics. This will help identify areas where low-cost storage or data tiering to archival storage is a viable cost-reduction opportunity.
- Data growth rates: High level metrics on data growth keeps IT and business heads on the same page so they can collaborate on data management decisions. Understand which groups and projects are growing data the fastest and ensure that data creation/storage is appropriate according to its overall business priority.
- Age of data and access patterns. In most enterprises, 60-80% of data is “cold” and hasn’t been accessed in a year or more. Metrics showing percentage of cold versus warm versus hot data are critical to ensure that data is living in the right place at the right time according to its business value and to optimize costs.
Beyond cost optimization, unstructured data management tools and practices can help deliver new value from data.
Unstructured data is the fuel needed for AI, yet its difficult to leverage because unstructured data is hard to find, search across, and move due to its size and distribution across hybrid cloud environments. Tagging and automation can help prepare unstructured data for AI and big data analytics programs. Tactics include:
- Preprocess data at the edge so it can be analyzed and tagged with new metadata before moving it into a cloud data lake. This can drastically reduce the wasted cost and effort of moving and storing useless data and can minimize the occurrence of data swamps.
- Applying automation to facilitate data segmentation, cleansing, search and enrichment. You can do this with data tagging, deletion or tiering of cold data by policy and moving data into the optimal storage where it can be ingested by big data and ML tools. A leading new approach to is the ability to initiate and execute data workflows.
- Use a solution that persists metadata tags as data moves from one location to another. For instance, files tagged as containing key project keywords by a third-party AI service should retain those tags indefinitely so that a new research team doesn’t have to run the same analysis over again — at high cost. Komprise Intelligent Data Management has these capabilities.
- Plan appropriately for large-scale data migration efforts with thorough diligence and testing. This can prevent common networking and security issues that delay data migrations and introduce errors or data loss.
The State of Unstructured Data Management
In August 2021, Komprise published the first State of Unstructured Data Management Report:
Highlights of the 2021 Unstructured Data Management Report
Unstructured Data is Growing, as are its Costs
- 65.5% of organizations spend more than 30% of their IT budgets on data storage and data management.
- Most (62.5%) will spend more on storage in 2021 versus 2020.
Getting More Unstructured Data to the Cloud is a Key Priority
- 50% of enterprises have data stored in a mix of on-premises and cloud-based storage.
- Top priorities for cloud data management include: migrating data to the cloud (56%) cutting storage and data costs (46%) and governance and security of data in the cloud (41%).
IT Leaders Want Visibility First Before Investing in More Data Storage
- Investing in analytics tools was the highest priority (45%) over buying more cloud or on-premises storage or modernizing backups.
- One-third of enterprises acknowledge that over 50% of data is cold while 20% don’t know, suggesting a need to right-place data through its lifecycle.
Unstructured Data Management Goals & Challenges: Visibility, Cost Management and Data Lakes
- 44.9% wish to avoid rising costs.
- 44.5% want better visibility for planning.
- 42% are interested in tagging data for future use and enabling data lakes.
2022 State Unstructured Data Management Report
In August 2022, Komprise published the 2nd annual State of Unstructured Data Management Report: Komprise Survey Finds 65% of Enterprise IT Leaders are Investing in Unstructured Data Analytics. The Top 5 trends from the report are summarized here. They are:
- User Self-Service: In data management, self-service typically refers to the ability for authorized users outside of storage disciplines to search, tag and enrich and act on data through automation—such as a research scientist wanting to continuously export project files to a cloud analytics service.
- Moving Data to Analytics Platforms: A majority (65%) of organizations plan to or are already delivering unstructured data to their big data analytics platforms.
- Cloud File Storage Gains Favor: Cloud NAS topped the list for storage investments in the next year (47%).
- User Expectations Beg Attention: Organizations want to move data without disrupting users and applications (42%).
- IT and Storage Directors want Flexibility: A top goal for unstructured data management (42%) is to adopt new storage and cloud technologies without incurring extra licensing penalties and costs, such as cloud egress fees.
State of Unstructured Data Management 2023
In September 2023, Komprise published the 3rd annual State of Unstructured Data Management report.
- Data governance is top enterprise priority when introducing AI
- AI: Intelligent data needs intelligent solutions
- Nearly a third of enterprises are already prepping for AI
- Getting Data Governance Right Top AI Priority in 2023
In a 2022 interview, Komprise co-founder and COO Krishna Subramanian defined unstructured data this way:
Unstructured data is any data that doesn’t fit neatly into a database, and isn’t really structured in rows and columns. So every photo on your phone, every X-ray, every MRI scan, every genome sequence, all the data generated by self-driving cars – all of that is unstructured data. And perhaps more relevant to more businesses, artificial intelligence (AI) and machine learning (ML) – they depend on, and usually output, unstructured data too.
Unstructured data is growing every day at a truly astonishing rate. Today, 85% of the world’s data is unstructured data.
And it’s more than doubling, every two years.
In part two of the interview, Krishna Subramanian noted:
Unstructured data doesn’t have a common structure. But it does have something called metadata. So every time you take a picture on your phone, there’s certain information that the phone captures, like the time of day, the location where the picture was taken, and if you tag it as a favorite, it’ll have that metadata tag on it too. It might know who’s in the photo, there are certain metadata that are kept.
All filing systems store some metadata about the data. A product like Komprise Intelligent Data Management has a distributed way to search across all the different environments where you’ve stored data, and create a global index of all that metadata around the data. And that in itself is a difficult problem, because again, unstructured data is so huge. A petabyte of data might be a few billion files, and a lot of these customers are dealing with tens to hundreds of petabytes.
So you need a system that can create an efficient index of hundreds of billions of files that could be distributed in different places. You can’t use a database, you have to have a distributed index, and that’s the technology we use under the hood, but we optimize it for this use case. So you create a global index. Learn more about unstructured data tagging.
The Future of Unstructured Data Management
In an end of the year blog post, Komprise executives review unstructured data management and data storage predictions for 2023 and the implications of adopting data services, processing data at the edge, multi-cloud challenges, the importance of getting smart data migration strategies, and more.