Request a Demo
  • Why Komprise
  • Products
    PLATFORM
    Intelligent Data Management Smart Data Workflows for AI Elastic Data Migration Komprise Analysis Architecture Data Storage Integrations
    The Komprise Data Experience The Komprise Data Experience

    Search and curate the right data for AI with governance.

    Learn more
  • Solutions
    Industries
    Engineering & Semiconductors Financial Services Hospitals & Healthcare Systems Life Sciences & Genomics Higher Education Legal Services Media & Entertainment Oil & Gas Public Sector
    Use Cases
    AI Data Workflows with Governance Data Classification AI Data Ingestion Analyze Unstructured Data Cloud Tiering and Archiving Cloud File Migration Migrate NAS and Object Data Optimize Cloud Data Storage Refresh Assessment Sensitive Data Management Ransomware Protection
  • Resources
    Learn
    Resource Center Blog Case Studies Solution Briefs Reports & Infographics White Papers Glossary
    Engage
    Events Videos Webinars Podcasts TechKrunch
    Guides
    AI Data Prep Data Migration Data Tiering Path to Cloud
    Latest Post
    Latest Post
  • Partners
    Partners
    AWS Cloudian Google Cloud Platform HPE IBM
    Partners
    Microsoft Azure NetApp Nutanix NVIDIA Pure Storage
    Partners
    Qumulo Scality SUSE VAST Data Wasabi
    Partner Programs
    Deal Registration Konnect Partner Program Komprise Technical Professional
  • Company
    Company
    About Careers Support Training Contact
    News
    Press Releases In the News Awards & Recognition
    News Post
    Latest News
  • 1.888.995.0290
  • Request a Demo

Preparing Unstructured Data for AI? Forget ETL

by Krishna Subramanian | July 10, 2025
AI-Ready Data

aidatavetl_resource_thumbnail_800x533 This article has been adapted from its original publication on ITProToday.

As AI transforms business operations, organizations need to focus on the data and, specifically, how to build efficient data pipelines to feed AI. The issue is that traditional data pipelines leveraging Extract, Transform, Load (ETL) were built for structured data and are fundamentally misaligned with AI’s needs.

ETL, which was designed for structured data from databases, no longer works in a world where 90% of data is unstructured and lives in files of many different formats and types. This data consists of documents, images, videos and audio files, instrument, and sensor data.

This shift in focus from data analytics of the past leveraging structured data to AI of today that requires large amounts of unstructured data demands a complete rethinking of how organizations prepare data for AI consumption.

The Unstructured Data Challenge

The core problem with unstructured data is its inherent lack of a common schema. You can’t take a video file, an audio file, or even three video files from three different applications and place them in a tabular format because they all have different contexts and different semantics.

An MRI medical image and a marketing photograph may share the same file extension, but they require unique metadata structures and processing approaches. As well, the same document format might need entirely different preprocessing depending on whether it’s being analyzed for legal compliance, customer sentiment, or research insights.

To make unstructured data usable, safe and searchable for AI pipelines, organizations need to accurately enrich metadata in ways that don’t require tedious, Sisyphean manual work. The metadata that storage systems automatically generate is limited: file type, creation date, author, modification date, size, last access date, and user ID.

  • To enrich metadata, you first need a way to create a global file index of your unstructured data regardless of which storage or cloud houses the data.
  • Once you have visibility, you can add tags manually with the help of departmental users who know their data and/or using AI and other automated tools.
  • These new technologies — which can be standalone or exist within an unstructured data management platform — rapidly scan data sets and apply relevant tags describing their contents.
  • This can identify sensitive data like personally identifiable information (PII) that must be excluded from AI workflows and add tags such as project code or research keywords that distinctly identify it for unique use cases. Komprise for Sensitive Data Management.
  • As you catalog unstructured data, it is important to ensure that metadata can follow the data wherever it moves, avoiding the need to re-create metadata.

Copying and moving unstructured data to locations for AI analysis is also time-consuming and expensive, and due to the size of the data, it can take weeks to months. As a result, you only want to move the precise data sets that you need, further highlighting the need for metadata enrichment and classification.

Why AI Workflows Break the ETL Model

Beyond format challenges, AI processing itself fundamentally differs from traditional analytics. With AI, the workflows become iterative and non-linear.

For example, let’s say you want Amazon Rekognition to look at images and tag them, run PII detection to find and exclude sensitive data and then send data to a large language model (LLM) like Azure OpenAI for chat augmentation. You now have three different AI processes working on the same data at different points. This creates an AI-feeding-AI scenario where outputs from one process become inputs for another. Traditional ETL  wasn’t designed for this cyclical enrichment process.

Additionally, AI introduces critical data governance challenges that are different from traditional analytics and unsupported by ETL, such as avoiding the exposure of sensitive data to commercial (external) AI services and maintaining clear audit trails of corporate data.  Finally, there is a need to keep a record of what metadata was AI-enriched versus AI-enriched and human-verified.

Smart Data Workflows for AI

du_smartdataworkflowsai_blog_resource_thumbnail_800x533A modern approach to AI unstructured data preparation requires rethinking the entire data pipeline. Rather than immediately moving data, start by building a comprehensive metadata index that spans all storage environments. This delivers intelligent curation that identifies the exact subset of data for AI processing based on content, context, and business requirements. A global metadata index should be designed to retain metadata and tags no matter where the data lives, so it is independent of your storage.

This approach delivers significant advantages. In one real-world example, Duquesne University used Komprise and AWS Rekognition to first index and curate data to identify 10,000 relevant images out of three million files, cutting processing costs by 97%.

Read the case study.

Komprise Smart Data Workflows delivers an automated process for unstructured data preparation and mobility:

  • Global metadata indexing and curation: Discover and select relevant data before moving it, integrating with AI processors as needed for rapid content analysis and tagging.
  • User tagging: Allow end users to tag their own data since they know it best.
  • Iterative enrichment: Store results as reusable metadata to avoid redundant processing.
  • Built-in AI data governance: Automatically detect sensitive information and maintain comprehensive audit trails.

There are several steps to follow on the path toward modern AI data preparation, including getting full visibility and analytics on unstructured data across storage silos, addressing data governance from the start, tracking AI data pipeline effectiveness with diverse use cases, and delivering departmental self-service capabilities for unstructured data classification.

As AI becomes central to business strategy, the organizations that implement smart data workflows will gain significant advantages in agility, cost efficiency, and risk management. The question isn’t whether your organization needs a new approach to unstructured data preparation for AI — it’s how quickly you can implement one.

guide_preparationforai_linkedinsocial1200x628

Getting Started with Komprise:

  • Learn about Intelligent Data Management
  • Schedule a demonstration with our team
  • Read the latest State of Unstructured Data Management Report
Learn more about: AI data governance, ETL, metadata, smart data workflows, unstructured data preparation

Search

Book a Demo Subscribe via RSS Feed

Categories

  • AI-Ready Data
  • Architecture
  • Cloud Data Archiving
  • Cloud Data Migration
  • Cloud Tiering
  • Customers
  • Integrations
  • Intelligent Data Management
  • Intelligent Data Management What's New
  • Komprise Konnects
  • News & Events
  • Partners
  • Use Cases

Sensitive Data Management for AI Data Governance and Cybersecurity 

Komprise AI Days: Unstructured Ignition in the Spotlight

5 Mistakes to Avoid When Refreshing Data Storage

Do You Know What Your Unstructured Data is Costing You?

Tips for a Clean Cloud File Data Migration

Archive

2025

  • – December (1)
    • Manufacturing Case Study: Komprise + Amazon FSx for ONTAP
  • + November (1)
  • + October (2)
  • + September (3)
  • + August (2)
  • + July (2)
  • + June (3)
  • + May (2)
  • + April (3)
  • + March (2)
  • + February (3)
  • + January (2)

2024

  • + December (3)
  • + November (3)
  • + October (2)
  • + September (3)
  • + August (2)
  • + July (3)
  • + June (2)
  • + May (2)
  • + April (3)
  • + March (2)
  • + February (4)
  • + January (2)

Recent Articles

fsxforontapblog_websitefeaturedimage_1200x600
Manufacturing Case Study: Komprise + Amazon FSx for ONTAP

This blog was adapted from the original version on the AWS Storage Blog. What you will learn: How a real…

Read More
2025deloittefast500pr_website_featuredimage_1200x600
Komprise Recognized Again on the 2025 Deloitte Fast500 List of Fastest-Growing Companies in North America

Campbell, CA—November 19, 2025 —Komprise, the leader in analytics-driven unstructured data management announces its inclusion in the  Deloitte Technology Fast…

Read More
hbt-logo-png
Bridging Healthcare’s Funding Gap with Data Management
Read More
komprise-_logo

Analyze, classify, move unstructured data at scale.

1901 S. Bascom Ave
Tower 1, 5th Floor
Campbell, California 95008
1.888.995.0290
Request a Demo

Platform

  • Intelligent Data Management
  • Smart Data Workflows for AI
  • Elastic Data Migration
  • Komprise Analysis
  • Architecture
  • Data Storage Integrations

Industries

  • Engineering & Semiconductors
  • Financial Services
  • Hospitals & Healthcare Systems
  • Life Sciences & Genomics
  • Higher Education
  • Legal Services
  • Media & Entertainment
  • Oil & Gas
  • Public Sector

Use Cases

  • AI Data Workflows
  • Data Classification
  • AI Data Ingestion
  • Analyze Unstructured Data
  • Cloud Tiering and Archiving
  • Cloud File Migration
  • Migrate NAS and Object Data
  • Optimize Cloud Data
  • Storage Refresh Assessment
  • Sensitive Data Management
  • Ransomware Protection

Resources

  • Resource Center
  • Blog
  • Case Studies
  • Webinars
  • TechKrunch
  • White Papers
  • Events
  • Videos
  • Glossary

Company

  • About
  • Careers old
  • Support
  • Training
  • Contact
  • Press Releases
  • In the News
  • Awards & Recognition

Resellers

  • Deal Registration
  • Konnect Partner Program
  • Komprise Technical Professional
© 2025 Komprise. All rights reserved
Privacy | Cookies