Back

Metadata

What is metadata?

Metadata means “data about data” or data that describes other data. The prefix “meta” typically means “an underlying definition or description” in technology circles. Standard metadata are storage system attributes such as: when the file was created, who created it, what type of file it is, its size, when it was last accessed, and when it was last modified.

What are some uses of metadata?

komprise2024state_of_udmpr_resource_thumbnail_800x533-1

  • Metadata makes finding and using data easier so that the user can quickly find and categorize specific documents. Some examples of basic metadata are author, date created, date modified, and file size. Metadata is also used for unstructured data such as images, video, web pages, spreadsheets, etc.
  • Web pages often include metadata in the form of meta tags. Description and keywords meta tags are commonly used to describe content within a web page. Search engines can use this data to help understand the content within a page.

How do you create and manage metadata?

Metadata can be created manually or through automation. System metadata creation is more elementary, usually only displaying basic information such as file size, file extension, when the file was created, for example. Users can tag their own data sets manually, with modifiers that identify the data based on its contents. AI tools can also enrich metadata by, for example, scanning file contents for keywords and creating curated data sets that can be tagged automatically using an unstructured data management system. Komprise delivers sensitive data tagging, to prevent PII, IP or other protected data from being stored in noncompliant locations. Learn more here.

Metadata can be stored and managed in a database, however, without context, it may be impossible to identify metadata just by looking at it. Metadata is useful in managing unstructured data since it provides a common framework to identify and classify a variety of data including videos, audios, genomics data, seismic data, user data, documents, logs.

Metadata and vector embeddings

Vector embeddings provide a machine-readable representation of a file’s contents aka what the file is about. Metadata, on the other hand, offers contextual information that often goes beyond the file’s content, explaining why the file exists, how it’s used, and by whom. While embeddings are powerful for content understanding, metadata is typically more concise and efficient for categorization and management. Embedding full file contents into metadata is not only inefficient, but it can also introduce data governance challenges, especially when applying AI models across all your data. Using both strategically ensures better context, performance, and compliance.

What are the top benefits of metadata management for unstructured data?

The right metadata strategy for unstructured data management brings many benefits, including:

  • Metadata brings structure to unstructured data, valuable for search, data mobility, management, and analytics;
  • Metadata delivers deeper insights on your data, such as: top data owners, top file types and sizes, and usage information such as last access date;
  • It improves cost savings and decision-making for data storage;
  • It supports compliance and AI data governance by tagging regulated or audited data sets;
  • Users can find key data sets faster and move them to the right location for AI and research projects.

How does Komprise manage metadata?

Komprise indexes metadata across different storage and cloud environments and acts on it at scale. Komprise  extracts both system metadata and extended metadata such as PII or project codes into a global file index. This index retains the knowledge no matter where your data lives, and it does so without changing the original files. Komprise Deep Analytics helps you query and filter data based on this index and Komprise Smart Data Workflows allows you to search and feed the right data to the right AI process and retain its outputs as additional metadata.

Metadata management is different than ETL when it comes to preparing data for AI. It delivers an ongoing workflow solution to find the right data, get it to the right compute, run the compute either locally or in the cloud, and then repeat this process again. A great example of this is our customer Duquesne University.

Learn more about the Komprise Global File Index and Deep Analytics.

Learn more about the Komprise Intelligent Data Management architecture.

What is Metadata?

Metadata is “data about data.” It is structured data that references and identifies data to give an essential extra layer of shorthand information. Metadata schema can be simple or complex but it provides an important underlying definition or description.

Types of Metadata

Today’s metadata ecosystem encompasses seven distinct types, each serving different purposes and requiring different approaches to capture and manage: 

System metadata: Storage systems automatically generate attributes like creation date, file size, ownership and permissions. While essential, this represents just the starting point. 

Header metadata: Technical format specifications embedded within files, such as camera settings in photos, document templates in Word files, or compression algorithms in media files. Applications and storage systems generate this automatically. 

Application-based metadata: Workflow states, approval chains and process information from business applications. Lab notebooks automatically generate metadata about experiment phases, approval status and system integration. 

Contextual metadata: Project identifiers, geographical tags, departmental associations and business context that gives meaning beyond technical properties. This requires sophisticated tools to capture and organizations add significant business value through this enrichment. 

Sensitivity metadata: PII, intellectual property, regulated data type and security classifications. This requires specialized tools to uncover and classify, as it involves analyzing file contents rather than just properties. 

User-based metadata: Manual tags, collaborative annotations and crowd-sourced insights that add human intelligence to data classification. While powerful, this approach faces scalability challenges as data volumes explode. 

AI-generated metadata: The newest and most transformative category. AI analyzes file contents and automatically generates contextual tags and classification insights at scale. 

 

 

Metadata Management

Metadata management includes both standard metadata that most storage systems create and track as well as more custom metadata that gives more context about the contents of the file. Metadata management is the administration of data that describes other data and can include metadata enrichment, via tagging. AI tools can help enrich metadata by inspecting file contents and identifying new tags to indicate demographics, project keywords, sensitive data, individuals and objects discussed or included in the file. Metadata management is important for understanding, aggregating, grouping and sorting data for use. Over the last decade, the rapid growth of data has created the need for metadata management to provide a clear insight into what data to produce and what data to consume. This ensures data becomes a valuable enterprise asset.

Advanced metadata is handled differently by file storage and object storage systems:

  • File storage organizes data in directory hierarchies, making it hard to add custom metadata attributes.
  • Object storage lacks the hierarchical directory structure of file storage, but you can customize it.

For instance, a clinical image file would only contain metadata such as creation date, owner, location, and size. But if it is stored as an object, a user can enrich the metadata with demographics such as patient’s name, age, and diagnosis.

Managing metadata requires strategy and automation: Choosing the best path forward can be difficult when business needs are constantly changing, data is growing explosively and data types are morphing from the collection of new data types such as IoT data, surveillance data, geospatial data and instrument data.

Read more about metadata and its role in unstructured data management in this two-part blog series.

Learn more about Komprise Smart Data Workflows

Learn more about Komprise Deep Analytics and the metadata-driven Komprise Global File Index

Want To Learn More?

Related Terms

Getting Started with Komprise: