This is the second blog in a two-part series on metadata management. Find the first blog here, which explains the role of metadata in unstructured data management. This metadata optimization content was adapted from its original version on Dataversity.
To review, here are the top benefits of metadata (which is data about your data):
- Metadata brings structure to unstructured data, which is critical for search, data mobility, management, and analytics;
- Metadata supplies better insights on your data, such as: top data owners, top file types and sizes, and usage information such as last access date;
- It improves cost savings and decision-making for data storage;
- It supports compliance by tagging regulated or audited data sets;
- Users can find key data sets faster and move them to the right location for AI projects.
Challenges with metadata management
Metadata is massive because the volume and variety of unstructured data – files and objects – are massive and difficult to wrangle. Data is spread across on-premises and edge data centers and clouds and stored in potentially many different systems. To leverage metadata, you first need a process and tools for managing data.
Managing metadata requires both strategy and automation; choosing the best path forward can be difficult when business needs are constantly changing and data types may also be morphing from the collection of new data types such as IoT data, surveillance data, geospatial data and instrument data.
Managing metadata as it grows can also be problematic. Can you have too much? One risk is a decrease in file storage performance. Organizations must consider how to mitigate this; one large enterprise we know switched from tagging metadata at the file level to the directory level. Read more about data tagging with Komprise.
How to optimize metadata for storage insights and savings
While you can benefit from the metadata that your storage systems automatically create, an optimal plan will include curated or refined metadata that adds additional information to your files. Here are some metadata optimization considerations:
- Develop a holistic metadata strategy, which includes rules and guidelines for using, searching for, and customizing metadata. This can ensure that metadata does not get out of control and that it is used appropriately. A strategy may include policies for security and privacy, such as separation of duty. For instance, in a highly regulated business, users can tag the files they have access to, but only certain IT users should be authorized to execute action on the data once tagged. Your strategy should spell out goals and desired outcomes for metadata management. Create a tagging taxonomy and/or metadata catalog so users know when to use what tags.
- Decide on directory-/folder-level tagging versus file-level tagging. The former is easier to manage, as it reduces the number of tags you must create, track, store, and manage. For instance, you can collect all files related to one program within an integrated marketing campaign into a directory and use an unstructured data management system to automatically tag it as such. However, be diligent on directory contents to ensure that no errant files have landed in the directory and are now being inappropriately tagged.
- Enrich metadata with custom tagging: There are many use cases, from legal to research to marketing to product development, where it’s useful to add additional metadata tags to files. For example, a biotech company running an experiment in Munich and one in Palo Alto could create tags for each of those experiments so that later, a researcher wanting to run additional analysis could select the specific files from the specific location that she needs. Metadata enrichment is easiest using unstructured data management software like Komprise. Otherwise, you will need a database to store and track metadata tags and policies and all tagging is manual. This will require heavy manhours so consider if you have the staff to do it.
- Collaborate with data stakeholders: IT and storage managers don’t typically have insight on the data, but rather managing storage and file access. IT must rely on data scientists and data owners to tag data accurately. You will need a process for collaborative metadata tag management.
- Metadata management automation: It’s highly advisable to use automation where you can, given the volume and variety of metadata today. You can do this with your existing storage solutions, with data governance software such as master data management or data catalog software and/or using unstructured data management solutions. There are caveats: Storage solutions have some metadata features, but these are limited to the files in that system; you’ll need to maintain and integrate multiple metadata processes and tools across all storage. Further, file storage systems do not allow you to add or edit metadata to files. Depending upon your goals, consider a unified solution that looks across all data and metadata to centralize your efforts.
- Use tools that combine queries and tagging: Metadata management tools should not overuse tags and make users generate tags for information already available in metadata. This is cumbersome for users and leads to tag proliferation, tag conflicts, and scaling issues. As well, solutions should provide the ability to build and save queries that combine both standard and extended metadata. This query-plus-tag approach delivers efficient automation, scaling and minimizes manual effort for users.
Final thoughts on metadata optimization
As unstructured data volumes grow, IT and storage managers need to control the chaos and the costs – and that encompasses the metadata. The optimal metadata management and metadata optimization strategy includes close collaboration with business and security teams on data governance and analytics needs, tagging tools to enrich the metadata and automation to analyze and track it. With some effort and the right investment, you can reap the priceless benefits of greater data storage cost savings and long-term value from your mountains of unstructured data and metadata.