This blog will cover how we use Elasticsearch to power Komprise Deep Analytics Service and how we are creating a massive and secure Global Metadatabase Service for our customers to help them manage their data.
Unstructured data is having a bit of a moment. We know data is used to track, model, and make decisions for practically every facet of life so it makes sense that data is critical to managing… wait for it: data. What do we mean? We’re talking metadata: data about data. This is where Komprise comes in.
How to move from managing storage to managing data
We help customers by providing an intuitive UI (and also APIs) to create custom queries based on file metadata:

- Where the data is located (e.g., what file server, share, or cloud)
- Who owns it
- When it was created, modified, and accessed
- File name, extension, type, size
- Custom tag – add your own data, like project ID
Using this data you can find the needle in the haystack or more likely millions of needles across many haystacks. And once you find those needles you can now access, protect, replicate or take other action on a very specific set of files. This is the breakthrough that lets IT make granular decisions about data according to business requirements, rather than just making sure there is enough physical disk space to house it.
The Unstructructured Metadata Mandate
Komprise was founded in 2014 and since then has helped customers index and store a staggering amount of metadata: hundreds of billions of records or observations to date.

How does Komprise manage this metadata? We chose the open-source indexing engine Elasticsearch. Elasticsearch indexes, stores, protects, and is the engine behind Komprise’s global metadata index that enables you to zero in on specific data sets over multiple data centers and hybrid cloud.
This approach keeps the Komprise architecture simple. Using a dedicated metadata solution means you can store as much metadata as needed (creating additional tags for example) without the concern that Komprise’s performance will be impacted. Other solutions try to do everything in a single central database that restricts scale and suffers performance penalties as the metadata load grows.
Here are a few examples of how our customers use metadata to manage their unstructured data:
- Collect specific data sets from multiple sites and clouds and copy to another location for analysis by AI/ML to get value out of data;
- Hunt down data from former employees to confine for deletion, free resources and comply with regulations;
- Archive clinical data while enabling availability for use in future studies and speeding the development of new therapies.
How Does Komprise Run and Secure Elasticsearch?
Komprise is by default a SaaS offering, with both the Komprise Director and Elasticsearch infrastructure running in the cloud on behalf of our customers. Komprise manages the security, configuration, patching, and protection of Elasticsearch.
The Observers deployed as a grid are deployed on premises adjacent to the data and stream the metadata to the Deep Analytics index service where it is indexed in the Elasticsearch cluster. Just like other tasks handled by the Observers, the indexing is distributed over the scale-out grid.
Get a closer look at the Komprise elastic software architecture. Read about Elastic Grid here. The diagram below illustrates how the Komprise Grid analyzes data over multiple data centers or clouds and streams the metadata to Elasticsearch while the Director queries and caches the results.

- Customers use the Komprise console to create and execute queries using Deep Analytics hosted by their dedicated Director running in the cloud.
- The Director then executes these queries against the Elasticsearch cluster.
- To secure communications between the Director and the Elasticsearch, a secure ID is used to map the Director to the dedicated indexes hosted by Elasticsearch.
- For customers that need to retain all data and metadata behind their firewall, Komprise can also run Elasticsearch in their data center.
- Even with on-prem deployment, we provide a fully-managed experience. The nodes running Elasticsearch are deployed from the on-prem Director as VM appliances and managed as an integrated component of the Komprise solution.

Emerging Deep Analytics Use Cases
Today we use the metadata to help customers make decisions about how to “right place” their data across storage resources. Tagging is the next step. Customers can tag data with a project ID for charge back, or tag X-ray images with demographic information to support clinical studies.
We see an evolution of unstructured data management beyond just the storage infrastructure team. Analytics will enable the owners or creators of the data to help decide how their data is stored and leveraged for future value. The ability to collect, store, index and enable search of this metadata in an intuitive manner is the critical component that will move us to data-centric management. Saving money on data management is the first step for most customers. Being able to do more with that data will help them drive real innovation.
