Komprise Smart Data Workflows: Automate Unstructured Data Discovery


Artificial intelligence, machine learning and all the variants including deep learning and natural language processing are tools of the trade now in our data-driven society. Yet the one thing that machine learning requires for uncovering new insights and patterns is unstructured dataa lot of it.

This is an issue because unstructured data is difficult to manage, not to mention mobilize. This data does not fit nicely into rows and columns. It comes in a variety of formats with no underlying semantic structure and is extremely difficult to ingest into traditional analytics platforms. Plus, since unstructured data makes up at least 80% of all data and the creation and replication of data is growing at a faster rate than storage capacity, it is too large to process and analyze in-house for most enterprises.

But lo and behold, cloud data lakes and cloud machine learning tools are making the previously impossible now possible, by creating more affordable, scalable ways to do machine learning without requiring massive investments in skills and infrastructure. Yet you still must get the right unstructured data into them and avoid the situation of creating data swamps if too much irrelevant data ends up there.

Most of this work in finding and categorizing unstructured data to feed machine learning pipelines has been manual. It doesn’t scale as data volumes grow and results in delays for timely research projects.

Komprise Smart Data Workflow Leveraging External Functions to Cull and Extract Data into Data Lakes

This Leads Us to Our Latest Announcement: Komprise Smart Data Workflows

We’re excited to share that with Komprise Intelligent Data Management, IT users can now create automated workflows for all the steps required to find the right data across your storage assets, tag and enrich the data, and send it to external tools for analysis.

The Komprise Global File Index and Smart Data Workflows together reduce the time it takes to find, enrich and move the right unstructured data by up to 80%.

We know that data scientists spend most of their time finding and preparing data for analysis, rather than doing the actual analysis and refining tests and results. We think this next phase of unstructured data management will be a major force for organizations looking to not only migrate and tier data to cloud storage for cost savings but also to monetize unstructured data.

“Komprise has delivered a rapid way to visualize our petabytes of instrument data and then automate processes such as tiering and deletion for optimal savings,” says Jay Smestad, Senior Director of Information Technology at PacBio. “Now, the ability to automate workflows so we can further define this data at a more granular level and then feed it into analytics tools to help meet our scientists’ needs is a game changer.”

Komprise Smart Data Workflows Are Relevant Across Many Sectors

Here’s an example from the pharmaceutical industry:
  1. Search: Define and execute a custom query across on-prem, edge and cloud data silos to find all data for Project X with Komprise Deep Analytics and the Komprise Global File Index.
  2. Execute & Enrich: Execute an external function on Project X data to look for a specific DNA sequence for a mutation and tag such data as “Mutation XYZ”.
  3. Cull & Mobilize: Move only Project X data tagged with “Mutation XYZ” to the cloud using Komprise Deep Analytics Actions for central processing.
  4. Manage Data Lifecycle: Move the data to a lower storage tier for cost savings once the analysis is complete.
Here’s an example of an edge-to-cloud workflow:
  • Lab instruments often generate terabytes of data which are stored in a NAS file system.
  • This file system can be used as a daily cache and Komprise can tag and automatically tier instrument data to low-cost cloud storage as it is created.
  • Cloud AI and ML tools can ingest the data for analysis.
  • This approach ensures lab data is tagged and available in the cloud. Users can natively access the data as objects, so they can import the right data for analysis at significantly lower costs.

Powered by APIs and the Global File Index

Updates to the Komprise Intelligent Data Management Platform, which make Smart Data Workflows possible, include:
  • API to Execute External Functions: Komprise can enrich data by allowing the execution of external functions or cloud services either at the edge, datacenter or cloud and then tagging data with metadata. Examples include: Snowflake, Amazon Macie, Azure machine learning.
  • Global File Index and Tags: The tags set by external functions are managed by the Komprise Global File Index and searchable no matter where the data moves.
  • Expanded Deep Analytics Actions leveraging the Global File Index: Expanding the range of data mobility actions, Komprise can use Deep Analytics query results to not only tier data specified by a query but also copy and confine such data.
  • Deep Analytics User Role: For better data governance and separation of duty, this role limits a user to only specify and save queries. An IT administrator with full privileges can then use the saved queries to manage the data using Komprise Deep Analytics Actions.

Learn more about Smart Data Workflows.

Getting Started with Komprise:

Contact | Data Assessment