Komprise Deep Analytics
Finding just the right data across billions of files can be challenging. Komprise Deep Analytics enables you to search and find data that fits your specific criteria across storage. Use the search results as a dynamic data lake to both plan your data management and to enable new uses like Big Data Analytics.
What is Komprise Deep Analytics?
Enterprise IT organizations want to leverage data for new uses such as Big Data Analytics or to run applications in the cloud, but to get to data science, you first need the right data. Studies show that 80% of the time in Big Data is spent on finding the right data and getting it out of data centers.
Komprise Deep Analytics addresses this problem by enabling customers to find the right data that fits specific criteria across all their storage and export this dynamic data lake to any analytics application or destination of their choice such as Hadoop, or Amazon Lambda. Komprise Deep Analytics creates a highly efficient, searchable distributed index of files with support for both standard metadata and custom metadata (tags). Customers can find data that fits criteria they set, regardless of where the data actually lives, and the resulting data set can be operated on as a discrete entity.
How Komprise Deep Analytics Fits
Komprise and industry research have shown that over 75% of data is infrequently accessed (cold) within months of creation. 80% of the cost of data is in its management; efficiently identifying and managing cold data yields significant savings. But in most organizations, both hot and cold data are being stored, replicated, and backed-up (usually multiple times) on expensive Tier 1 storage. Yet, organizations need to be as efficient as possible with their storage spend, especially given flat or shrinking IT budgets.
Komprise Intelligent Data Management software quickly identifies cold data across a customer’s NAS storage. Users can then move it to more cost-effective storage options, without any impact to users or applications, using the patented Komprise Transparent Move Technology (TMT™). This approach ensures that only active data is kept on expensive tier-1 primary storage.
Deep Analytics further extends the powerful analytics that Komprise already provides with a fully searchable index of all the data. This enables users to easily find specific data sets across storage, across billions of files. You can now search and find relevant data easily and with more granularity. Users can create custom queries to find the data, then tag them with custom tags to assemble virtual data lakes. For example, users can now create complex queries, find (and tag) specific projects, identify types of cold data, find orphan data, search for data owned by specific users, and more. It enables new uses of the data, such as for big data analytics and AI/ML applications.
Use Cases for Deep Analytics
When you setup Komprise, within minutes, even on petabytes of data, you get a quick view of how data is growing, how data is being used, how much cold data you have, and your estimated ROI based on different scenarios
that help you plan your data management strategies. Analytics drives Komprise data management and analysis is available throughout the product, including under the Plan → Data and Plan → Usage tabs.
Deep Analytics is a query engine that enables users to perform deeper analysis at the file level, using custom queries and filters analysis for specific data sets within or across shares. This enables storage IT administrators to gain deeper insights about, and therefore greater control over, their enterprise’s data.
Deep Analytics creates a searchable index of all the standard metadata as well as extended metadata (or tags) of data across storage. Currently, all data sources analyzed by Komprise are also indexed by Deep Analytics when it is enabled.
With Deep Analytics, users can create custom queries using any combination of file metadata to narrow down to specific data sets, and can download reports including a summary as well as more detailed results. In addition, users can also summarize results by file metadata parameters.
Here are few examples of such queries:
- Find top users in the engineering department who have the largest amount of data on file server “NAS92”
- Find out which departments are creating large video and archive files across all shares
- Find out which users in R&D have not accessed most of their data in the last two years
- Find data of users who are no longer employed in the company
How Deep Analytics Works
Deep Analytics is a licensed feature, so it first must be specified in your license. Once licensed, there will be a setting to enable or disable Deep Analytics.
When shares are added to Komprise and enabled, Komprise starts to rapidly aggregate analytics information across these shares and these results are available in the Plan page. If Deep Analytics is enabled, then in the background, Komprise builds the Deep Analytics index. Deep Analytics runs take longer than the regular fast analysis, since every file’s metadata is examined and indexed.
After a Deep Analytics run has begun on a share, queries can be made against data (files) on that share. Query results, however, will be partial until the Deep Analytics run has completed on the share. Subsequent Deep Analytics runs will occur on each enabled share after a default delay interval of 30 days.
Deep Analytics keeps an index of all the standard and extended metadata. No file contents are stored. Indexed metadata includes:
- File name
- File parent directory
- File size
- File extension
- File type (directory, file, symbolic link, Komprise file link)
- File creation date
- File last modified date
- File last accessed date
- Owner id (uid/sid)
- Owner name
- Group id (uid/gid)
- Group name
- Tags (custom Komprise metadata)
NOTE: No file content is ever read or stored by Komprise.
Deep Analytics Tagging
Komprise Deep Analytics enables data to be tagged and for tags to be used in queries.
Tagging makes it easy to organize and find data based on extended metadata attributes beyond the standard file metadata. This can be useful in many ways:
- Grouping together data that satisfies multiple criteria easily
- Creating tags outside of Komprise (e.g. tagging data at the source when you know more about the data), and leveraging the tags to search and find relevant data within Komprise
- Managing and finding data by these tags rather than relying on just standard file metadata.
Example: Let’s say we want to run an operation on data belonging to either Project X or Project Y. We can first run a search in Komprise for any files that belong to Project X and tag them with Project X. Then similarly find and tag files related to Project Y. Then run another search in Deep Analytics for files with either Project X or Project Y tags and operate on that data set.
Tags can also be set via API outside of Komprise.
Deep Analytics API
All Deep Analytics functionality is accessible through an API. The API enables capabilities including:
- Creating, saving, renaming, deleting, and running queries
- Setting query filters on:
- File servers and shares
- Directory path
- File name
- Last modified time
- Last access time
- File type
- File extension
- File size
- Creating tags (keys and values)
- Applying and removing tags from files and query results
- Monitoring tagging tasks
- Retrieving the set of all tags (keys and values) created
- Summary of query results
- Top 5000 files of a query result
- Retrieving summary of query result by:
- Top shares
- Top file servers
- Top owners
- Top groups
- File types
- File extensions
- File sizes
- And retrieving the top 5000 files for any of these
Deep Analytics Deployment Architecture
Komprise Deep Analytics can be run in the cloud or on-premises, with initial releases supporting only the former. Deep Analytics utilizes secure, cloud-based services, including metadata indexes, and a powerful, open source analytics and search engine. No Deep Analytics components need to be deployed on-premises and the current Komprise Observers used in the Komprise Intelligent Data Management solution now also send file metadata into the secure cloud indexes.
When deployed in the cloud, Komprise manages all the analytics components in the cloud and the customer needs to only deploy on-premises Observers. This deployment is shown in Figure 4 below.
Advantages of a cloud-based deployment include:
- Fast, easy deployment, SaaS model: only standard Komprise Observers need to be deployed on-premises
- Enables use of thin Observer resources: simpler and cost-effective provisioning, maintenance, and growth accommodation
- Accommodates elastic growth and shrinking of data sizes: you can add more data to analyze or remove data and Komprise automatically adjusts so you don’t have to worry
- Enables transparent upgrades: only cloudbased components need to upgrade
When Deep Analytics is deployed on-premises, the customer is responsible for deploying the appropriate hardware for all components, as well as the Director, Observers, Analytics Services, Search Cluster.
Komprise provides guidelines for requirements of server sizing and software. In both cases, Komprise ensures a secure, scalable, high performance system deployment.
To see Deep Analytics in your environment, contact firstname.lastname@example.org.