Smart Data Workflows: The Evolution of Unstructured Data Management


“Moving data to the cloud can help you optimize your infrastructure, but the bigger value is in leveraging the compute power and data services in the cloud,” remarked Komprise co-founder and COO Krishna Subramanian in our second Cloud Field Day 14 presentation.

Subramanian defines unstructured data as any data we can access as a file or as an object that does not fit neatly into database rows and columns. These data sets are piling up in the data center, at the edge and in the cloud. An unstructured data management solution that can look across all your sources of unstructured data, provide an analytical view of this data, mobilize data and allow users to index, search and deliver only what is needed to data consumers is a smart strategy to modernize your data storage practice.

In this session, Subramanian briefly introduces the Komprise SaaS platform and introduces our latest product update: Smart Data Workflows. Here’s an overview of what her session covered. You can watch the full session here:

With Smart Data Workflows, IT users can create automated workflows for all the steps required to find the right unstructured data across storage assets, tag and enrich the data and send it to external tools for analysis. This eliminates manual effort in unstructured data management and helps organizations speed time to value from new cloud-native tools.

With Smart Data Workflows, you can deliver only the right file and object data into a data lake: preventing the dreaded data swamp.

Does Komprise alter the data? No. Data remains in native format. When a file is moved to an object store, Komprise does not “munge it up.” We call it file-object duality. Read about it in this post: Why Cloud Native Data Access Matters.

The Power of Global Unstructured Data Visibility

Krishna introduced the Global File Index, which is a unified view of your data without moving the data. Today enterprise IT organizations are flying blind. They don’t know what data is sitting where, who is using the data, how the data is growing and what the data is costing them. End users can’t find the data they need when they need it. With today’s data volumes, organizations must have full visibility to make good decisions. Once you have this data visibility, Komprise makes it actionable. This is where the magic happens.

With billions of files and objects, analytics plus continuous mobilization is essential because data has a lifecycle and data management is not a one-time thing.

Smart Data Workflow Use Cases

Before the demonstration, Krishna reviewed a series of Smart Data Workflow use cases, including:

 Legal Hold


  • Search & Curate: Define and execute a custom query to find all data related to a divestiture project with Komprise Deep Analytics and the Komprise Global File Index.
  • Execute & Enrich: Execute an external function to identify PII data and tag it.
  • Cull & Extract: Move sensitive data to an object-locked cloud storage bucket and move the rest to a writable cloud bucket using Komprise Deep Analytics Actions.
  • Manage Lifecycle: Move the data to a lower storage tier for cost savings once the analysis is complete.
 Genomics Sequencing


  • Search & Curate: Define and execute a custom query to find all data for Project X with Komprise Deep Analytics and the Komprise Global File Index.
  • Execute & Enrich: Execute an external function on Project X data to look for specific DNA sequence for a mutation and tag such data as “Mutation XYZ”.
  • Cull & Extract: Move only Project X data tagged with “Mutation XYZ” to the cloud using Komprise Deep Analytics Actions.
  • Manage Lifecycle: Move the data to a lower storage tier for cost savings once the analysis is complete.
 Autonomous Vehicles


  • Search & Curate: Find crash test data related to abrupt stopping of a specific vehicle model with Komprise Deep Analytics and the Komprise Global File Index.
  • Execute & Enrich: Execute an external function to identify and tag data with “Reason = Abrupt Stop”.
  • Cull & Extract: Move only the related data to the cloud data lakehouse to reduce time and cost associated with moving and analyzing unrelated data using Komprise Deep Analytics Actions.
  • Manage Lifecycle: Move the unrelated data to a lower storage tier for cost savings (or delete it) once the analysis is complete.

Smart Data Workflow Demonstration

Komprise CTO Mike Peercy delivered a demo related to autonomous vehicle data–because let’s face it, none of us will be driving in 10 years. The topic was also discussed in this recent webinar with AWS: A Modern Data Strategy for the Automotive Industry.

Here is the flow:

  1. ENABLE SHARES with autonomous vehicles data for processing
  2. Show Deep Analytics (which is the UI for the GFI – Global File Index)
  3. Show query for crash reports in 2019
  4. Show query with TAG : Stopped in traffic – there will be NONE
  5. Create Plan to analyze contents of 2019 files using LOCAL FUNCTION to tag matching files
  6. Activate the Plan to analyze and tag files
  7. Show query with TAG: Stopped in traffic – there will be MANY

Questions from Cloud Field Day Delegates

What options do customers have to scale up/out performance?
Mike walked through the scale-out architecture of Komprise Observers in his Chalk Talk session. Observers are like virtual machines that reside next to the storage, whether in the cloud or on-premises and they scale out into a fault-tolerant grid. Learn more here.

Can you add locations as you grow and easily manage that in an automated way?
Yes. Mike explained the Komprise multisite capabilities, and discusses a central hub approach for Observers for multi-edge site deployments.

How do you deal with encrypted content?
The content of files is invisible to Komprise. So how do you classify a file if you don’t know what it is? Komprise only looks at the metadata that the storage systems show. Komprise does not look inside files, but can trigger a mechanism via the API for the customer to look inside the file and then tag and mobilize the data as needed.

The tag that is being applied is now in the Komprise Global File Index. The actual file itself doesn’t change, correct?
Correct. The tags are within Komprise and Komprise is not in the hot data path. The interaction between Komprise tags and cloud tags is on the roadmap. Pretty cool stuff. Learn more about automated unstructured data tagging.

Metadata in Focus

This led to an interesting discussion about metadata, initiated by @datachick Karen Lopez.

Subramanian clarified that Komprise does metadata-level find, search, curate, enrich, mobilize and lifecycle management. Komprise is enabling the workflow that calls an external function. Komprise calls an application or some cognitive service (as determined by the customer) via the API that does the processing.

Read about the first Cloud Field Day presentation here in the blog.


Getting Started with Komprise:

Contact | Data Assessment