Application Programming Interface (API)

What is an API?

An Application Programming Interface (API) is a set of protocols, routines, and tools for building software applications. APIs define how software components should interact with each other, providing a standard way for developers to create programs that can access services or data provided by other software components or systems.

APIs allow developers to access services or data without needing to understand how those services or data are implemented. Instead, they can use the API’s predefined set of functions and methods to interact with the service or data. This makes it easier and faster for developers to create new applications that can leverage existing services and data sources.

APIs are often used to connect different software components or systems, such as web applications or mobile apps to backend servers or databases. They can also be used to integrate different software tools, enabling them to work together seamlessly.

APIs can be public or private, depending on whether they are available for external developers to use or are restricted to use within a specific organization or system. Many public APIs are available from companies such as Google, Amazon, and Twitter, which provide access to their services and data for developers to build applications on top of.

APIs are an essential tool for modern software development, enabling developers to build complex and powerful applications quickly and efficiently by leveraging existing services and data sources.

API-Driven Data Management and Data Migration

Komprise Smart Data Workflows can enrich data by allowing the execution of external functions or cloud services either at the edge, datacenter or cloud and then tagging data with metadata. Examples include: Snowflake, Amazon Macie, Azure machine learning.

Read the blog post

Read the AWS blog: Using Amazon Macie with Komprise for Detecting Sensitive Content in On-Premises Data

Komprise-Smart-Data-Workflows-Diagram-9-1536x685

Komprise Elastic Data Migration is both UI and API driven. Here are is an example of a hospital group who used the Komprise API to migrate petabytes of SMB files from EMC Isilon access zones to Qumulo. Komprise set up 400+ migration jobs via scripting using the APIs and migrated 278 million SMB files spanning nearly 1500 shares. Because of the number of shares and folders in the environment it was unrealistic to set up migrations one at a time via the UI, which led to Komprise recommending the API approach.

Read the blog post: 5 Industry Data Migration Use Case

Getting Started with Komprise:

Want To Learn More?

Data Tagging

What is data tagging?

Data tagging is the process of adding metadata to your file data in the form of key value pairs. These values give context to your data, so that others can easily find it in search and execute actions on it, such as move to confinement or a cloud-based data lake. Data tagging is valuable for research queries and analytics projects or to comply with regulations and policies.

How does Komprise data tagging work?

Komprise-Automated-Data-Tagging-blog-THUMBUsers, such as data owners, can apply tags to groups of files and tags can also be applied programmatically by analytics applications via API. In the Komprise Deep Analytics interface, users can query the Global File Index and find the data for tagging. This is done by creating a Komprise Plan that will invoke the text search function to inspect and tag the selected files. The ability to use Komprise Intelligent Data Management to search, find, apply tags and then take action makes it possible for customers to get faster value from enriched data sets.

Tagging and Smart Data Workflows

Komprise-Smart-Data-Workflows-blog-SOCIAL-1-768x402

Komprise Smart Data Workflows automate unstructured data discovery, data mobility and the delivery of data services.

  • Define custom query to find specific data set.
  • Analyze and tag data sets with additional metadata
  • Move only the tagged data for analytics, AI/ML, etc.
  • Move to a lower-cost data storage tier after analysis

Komprise-Search-and-Tag-Blog-THUMB

———-

Getting Started with Komprise:

Want To Learn More?

REST (Representational State Transfer)

REST (Representational State Transfer) is a software architectural style for distributed hypermedia systems, used in the development of Web services. Distributed file systems send and receive data via REST. Web services using REST are called RESTful APIs or REST APIs.

There are several benefits to using REST APIs: it is a uniform interface so you don’t have to know the inner workings of an application to use the interface, it’s operations are well defined and so data in different storage formats can be acted upon by the same REST APIs, and it is stateless, so each interaction does not interfere with the next. Because of these benefits, REST APIs are fast, easy to implement with, and easy to use. As a result, REST has gained wide adoption.

6 guiding principles for REST:

  1. Client–server – Separate user interface from data storage improves portability and scalability.
  2. Stateless – Each information request is wholly self contained so session state is kept entirely on the client.
  3. Cacheable – A client cache is given the right to reuse that response data for later, equivalent requests.
  4. Uniform interface – The overall REST system architecture is simplified and uniform due to the following constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.
  5. Layered system – The layered system is composed of hierarchical layers and
    each component cannot “see” beyond the immediate layer with which they are interacting.
  6. Code on demand (optional) – REST allows client functionality to be extended by downloading and executing code in the form of applets or scripts.

The REST architecture and lighter weight communications between producer and consumer make REST popular for use in cloud-based APIs such as those authored by Amazon, Microsoft, and Google. REST is often used in social media sites, mobile applications and automated business processes.

REST provides advantages over leveraging SOAP

REST is often preferred over SOAP (Simple Object Access Protocol) because REST uses less bandwidth, making it preferable for use over the Internet. SOAP also requires writing or using a server program and a client program.

RESTful Web services are easily leveraged using most tools, including those that are free or inexpensive. REST is also much easier to scale than SOAP services. Thus, REST is often chosen as the architecture for services available via the Internet, such as Facebook and most public cloud providers. Also, development time is usually reduced using REST over SOAP. The downside to REST is it has no direct support for generating a client from server-side-generated metadata whereas SOAP supports this with Web Service Description Language (WSDL).

Unstructured data management software using REST APIs

Open-APIs and a REST-based architecture are the keys to Komprise integrations. Using REST APIs gives customers the greatest amount of flexibility and here are some things customers can do with the Komprise Intelligent Data Management software via its REST API:

  • Get analysis results and reports on all their data
  • Run data migrations, data archiving and data replication operations
  • Search for data across all their storage by any metadata and tags
  • Build virtual data lakes to export to AI and Big Data applications

A REST API is a very powerful, lightweight and fast way to interact with data management software. Here is an example of the Komprise API in action: Automated Data Tagging with Komprise.

BLOG-Smart-Data-Workflows-Architecture-Overview-CFD14-THUMB-768x512

Getting Started with Komprise:

Want To Learn More?

Tagging data

The often-lengthy process of annotating or labeling data (like text or objects in videos and images) to make it detectable and recognizable to computer vision to train the AI models through ML algorithm for predictions. Creating Virtual Data Lakes, the Global File Index with Komprise Deep Analytics makes this process much faster.

Komprise-Automated-Data-Tagging-blog-SOCIAL-768x402

Getting Started with Komprise:

Want To Learn More?

Unstructured Data Governance

Unstructured data governance is a growing practice in enterprise IT as data volumes have exploded and organizations need to manage data assets to reduce risks and costs and ensure data is discoverable for new uses. Unstructured data includes text documents, emails, images, videos, social media posts, audio files, sensor data and other data types that do not fit neatly into traditional structured databases. Unlike structured data that can be organized into tables and fields, unstructured data lacks a predefined format, making it challenging to manage, search, and mine for new insights.

An unstructured data governance strategy can involve many components:

  • Data Discovery and Inventory: Organizations need to identify and catalog unstructured data to manage it properly. This involves locating data stored across various repositories, including file shares, cloud storage, email systems, and more. A thorough inventory delivers holistic visibility into data assets to inform decision-making.
  • Data Classification and Tagging:  IT managers need the ability to tag and segment unstructured data based on its sensitivity, importance, and relevance to the organization.  This includes tagging data with metadata that indicates details such as owners, purpose or project, security (such as containing PII), compliance requirements and other identifying characteristics of the file contents.
  • Access Control and Security: Implementing access controls ensures that only authorized individuals can access and modify sensitive unstructured data. This involves defining user roles, permissions, and authentication mechanisms to safeguard the data from unauthorized access or breaches.
  • Data Retention Policies: Organizations need to establish policies that dictate retention policies for unstructured data.  Doing so helps ensure compliance with legal and regulatory requirements, lowers the risk of retaining unnecessary data and lowers costs of data storage and backups.
  • Data Privacy and Compliance:  Data privacy regulations such as GDPR, HIPAA, or CCPA require proper handling and protection of personal and sensitive data. Unstructured data governance includes procedures to ensure compliance with these regulations—such as how and where regulated data is stored.
  • Data Lifecycle Management: This involves managing data from creation to deletion. It includes processes for capturing, storing, migrating, archiving, and deleting unstructured data as its needs and value to the organization change.
  • Search and Discovery: Deep search capabilities aided by metadata and content indexing help users find relevant unstructured data quickly.
  • Data Analytics and Insights: Extracting valuable insights from unstructured data requires tools and techniques for data analysis, such as natural language processing (NLP), text mining and sentiment analysis.
  • Data Stewardship: Assigning data stewards responsible for managing and overseeing specific sets of unstructured data can help ensure that data is properly maintained, accurate, and up-to-date.
  • Monitoring and Auditing: Regularly monitoring and auditing unstructured data governance processes is important for compliance, security, and to reduce risks and improve outcomes from analytics and AI initiatives.

Read more in the blog on data governance tips for generative AI.

Unstructured data governance is critical for maintaining data quality, security, compliance, and deriving meaningful insights from the vast amounts of unstructured data that organizations generate and store. Proper governance practices contribute to better decision-making, reduced risks, and improved overall unstructured data management.

Komprise-Spring-2023-Blog_Website-Featured-Image_1200px-x-600px-002

Learn how Komprise is bringing new data governance features to its unstructured data management solution in this blog.

Getting Started with Komprise:

Want To Learn More?

Unstructured Data Management

Komprise-Analysis-Only-WP-graphic-1

What is Unstructured Data Management?

Unstructured data management is a category of software that has emerged to address the explosive growth of unstructured data in the enterprise and the modern reality of hybrid cloud storage. In the Komprise 2023 Komprise State of Unstructured Data Management, 32% of organizations report that they are managing 10PB of data or more. That equates to 110,000 ultra-high-definition (UHD) movies, or half of the data stored by the U.S. Library of Congress. Most (73%) of organizations are spending more than 30% of their IT budget on data storage.

Data storage and data backup technology vendors are now recognizing the importance of unstructured data management as data outlives infrastructure and as data mobility is needed to leverage cloud data storage.

Unstructured data management must be independent and agnostic from data storage, backup, and cloud infrastructure technology platforms.

There are 5 requirements for unstructured data management solutions:

  1. Goes Beyond Storage Efficiency
  2. Must be Multi-Directional
  3. Doesn’t Disrupt Users and Workflows
  4. Should Create New Uses for Your Data
  5. Puts Your Data First and Avoids Vendor Lock-In

An analytics-based unstructured data management solution brings value by analyzing all data in storage across on-premises and cloud environments to deliver deep insights. This knowledge helps IT managers make great decisions with users in mind, optimize costs and reduce security and regulatory compliance risks. These insights go beyond traditional storage metrics such as latency, IOPS and network throughput.

Here are some of the new metrics made possible with data management software:

  • Top data owners/users: See trends in usage and and possible compliance issues, such as individual users storing excessive video files or PII files being stored in an insecure location.
  • Common file types: The ability to see data by file extension eases the process of finding all files related to a project and can inform future research initiatives. This could be as simple as finding all the log files, trace files or extracts from a given application or instrument and moving them to a data lake for analysis.
  • Storage costs for chargeback or showback: Whether for chargeback requirements or not, stakeholders should understand costs in their department and be able to view metrics. This will help identify areas where low-cost storage or data tiering to archival storage is a viable cost-reduction opportunity.
  • Data growth rates: High level metrics on data growth keeps IT and business heads on the same page so they can collaborate on data management decisions. Understand which groups and projects are growing data the fastest and ensure that data creation/storage is appropriate according to its overall business priority.
  • Age of data and access patterns. In most enterprises, 60-80% of data is  “cold” and hasn’t been accessed in a year or moreMetrics showing percentage of cold versus warm versus hot data are critical to ensure that data is living in the right place at the right time according to its business value and to optimize costs.

Read: File Data Metrics to Live By

Beyond cost optimization, unstructured data management tools and practices can help deliver new value from data.

Unstructured data is the fuel needed for AI, yet its difficult to leverage because unstructured data is hard to find, search across, and move due to its size and distribution across hybrid cloud environments. Tagging and automation can help prepare unstructured data for AI and big data analytics programs. Tactics include:

  • Preprocess data at the edge so it can be analyzed and tagged with new metadata before moving it into a cloud data lake. This can drastically reduce the wasted cost and effort of moving and storing useless data and can minimize the occurrence of data swamps.
  • Applying automation to facilitate data segmentation, cleansing, search and enrichment. You can do this with data tagging, deletion or tiering of cold data by policy and moving data into the optimal storage where it can be ingested by big data and ML tools. A leading new approach to is the ability to initiate and execute data workflows.
  • Use a solution that persists metadata tags as data moves from one location to another. For instance, files tagged as containing key project keywords by a third-party AI service should retain those tags indefinitely so that a new research team doesn’t have to run the same analysis over again — at high cost. Komprise Intelligent Data Management has these capabilities.
  • Plan appropriately for large-scale data migration efforts with thorough diligence and testing. This can prevent common networking and security issues that delay data migrations and introduce errors or data loss.

The State of Unstructured Data Management

In August 2021, Komprise published the first State of Unstructured Data Management Report:

State-of-Unstructured-Data-management-Report-Thumbnail

Highlights of the 2021 Unstructured Data Management Report

Unstructured Data is Growing, as are its Costs

Data-Storage-Spend-Charts-1

  • 65.5% of organizations spend more than 30% of their IT budgets on data storage and data management.
  • Most (62.5%) will spend more on storage in 2021 versus 2020.
Getting More Unstructured Data to the Cloud is a Key Priority

Majority-of-Data-Stored-Chart-1

  • 50% of enterprises have data stored in a mix of on-premises and cloud-based storage.
  • Top priorities for cloud data management include: migrating data to the cloud (56%) cutting storage and data costs (46%) and governance and security of data in the cloud (41%).
IT Leaders Want Visibility First Before Investing in More Data Storage
  • Investing in analytics tools was the highest priority (45%) over buying more cloud or on-premises storage or modernizing backups.
  • One-third of enterprises acknowledge that over 50% of data is cold while 20% don’t know, suggesting a need to right-place data through its lifecycle.
Unstructured Data Management Goals & Challenges: Visibility, Cost Management and Data Lakes
  • 44.9% wish to avoid rising costs.
  • 44.5% want better visibility for planning.
  • 42% are interested in tagging data for future use and enabling data lakes.

Komprise-State-of-Unstructured-Data-Management-Report-SOCIAL-2-1

2022 State Unstructured Data Management Report

In August 2022, Komprise published the 2nd annual State of Unstructured Data Management Report: Komprise Survey Finds 65% of Enterprise IT Leaders are Investing in Unstructured Data Analytics. The Top 5 trends from the report are summarized here. They are:

  1. User Self-Service: In data management, self-service typically refers to the ability for authorized users outside of storage disciplines to search, tag and enrich and act on data through automation—such as a research scientist wanting to continuously export project files to a cloud analytics service.
  2. Moving Data to Analytics Platforms: A majority (65%) of organizations plan to or are already delivering unstructured data to their big data analytics platforms.
  3. Cloud File Storage Gains Favor: Cloud NAS topped the list for storage investments in the next year (47%).
  4. User Expectations Beg Attention: Organizations want to move data without disrupting users and applications (42%).
  5. IT and Storage Directors want Flexibility: A top goal for unstructured data management (42%) is to adopt new storage and cloud technologies without incurring extra licensing penalties and costs, such as cloud egress fees.
Komprise-State-of-Unstructured-Data-Management-Report-2022-BLOG-SOCIAL-1
Unstructured Data Management

State of Unstructured Data Management 2023

In September 2023, Komprise published the 3rd annual State of Unstructured Data Management report.

The coverage focused on the fact that 66% of respondents said preparing data storage and data management for AI and GenerativeAI in general is a top priority and challenge.

Komprise-2023-State-of-Unstructured-Data-Management_-Linkedin-Social-1200px-x-628px

Why you need to manage your unstructured data?

In a 2022 interview, Komprise co-founder and COO Krishna Subramanian defined unstructured data this way:

Unstructured data is any data that doesn’t fit neatly into a database, and isn’t really structured in rows and columns. So every photo on your phone, every X-ray, every MRI scan, every genome sequence, all the data generated by self-driving cars – all of that is unstructured data. And perhaps more relevant to more businesses, artificial intelligence (AI) and machine learning (ML) – they depend on, and usually output, unstructured data too.

Unstructured data is growing every day at a truly astonishing rate. Today, 85% of the world’s data is unstructured data.

And it’s more than doubling, every two years.

The importance of an unstructured data strategy for enterprise

In part two of the interview, Krishna Subramanian noted:

Unstructured data doesn’t have a common structure. But it does have something called metadata. So every time you take a picture on your phone, there’s certain information that the phone captures, like the time of day, the location where the picture was taken, and if you tag it as a favorite, it’ll have that metadata tag on it too. It might know who’s in the photo, there are certain metadata that are kept.

All filing systems store some metadata about the data. A product like Komprise Intelligent Data Management has a distributed way to search across all the different environments where you’ve stored data, and create a global index of all that metadata around the data. And that in itself is a difficult problem, because again, unstructured data is so huge. A petabyte of data might be a few billion files, and a lot of these customers are dealing with tens to hundreds of petabytes.

So you need a system that can create an efficient index of hundreds of billions of files that could be distributed in different places. You can’t use a database, you have to have a distributed index, and that’s the technology we use under the hood, but we optimize it for this use case. So you create a global index. Learn more about unstructured data tagging.

The Future of Unstructured Data Management

In an end of the year blog post, Komprise executives review unstructured data management and data storage predictions for 2023 and the implications of adopting data services, processing data at the edge, multi-cloud challenges, the importance of getting smart data migration strategies, and more.

Getting Started with Komprise:

Want To Learn More?

Unstructured Data Storage

Unstructured data storage is the storage of data that does not adhere to a predefined data model or schema. Unlike structured data, which fits neatly into tables with rows and columns, unstructured data lacks a specific organization and may include various file types, such as text documents, images, videos, audio files, emails, social media posts, and more.

Read the article: Here’s How to Take Control of Unstructured Data

Gartner on unstructured data storage

Gartner-Logo

Each year Gartner publishes the Magic Quadrant for Distributed File Systems and Object Storage.

Gartner defines distributed file systems and object storage as software and hardware appliance products that offer object and distributed file system technologies for unstructured data. Their purpose is to store, secure, protect and scale unstructured data with access over the network using file and object protocols, such as Amazon Simple Storage Service (S3), Network File System (NFS) and Server Message Block (SMB).

Gartner also has a Primary Data Storage Magic Quadrant, as summarized in this Blocks & Files article.

Common requirements for unstructured data storage

  • Flexibility: Unstructured data storage systems are flexible and can accommodate various types of data without requiring predefined schemas. This flexibility allows organizations to store and manage diverse data types efficiently.
  • Scalability: Unstructured data storage solutions are often designed to scale easily, allowing organizations to handle massive volumes of data as their storage requirements grow over time.
  • Indexing and Search: Effective management of unstructured data involves indexing and search capabilities to quickly locate and retrieve specific information within large datasets. This may involve metadata tagging, full-text search, and other techniques to facilitate data discovery. See unstructured data classification.
  • Object Storage: Object storage is a common approach to storing unstructured data, where each piece of data is stored as an object with a unique identifier and metadata. Object storage systems provide scalability, durability, and accessibility for large-scale unstructured data environments.
  • Cloud Storage: Many organizations leverage cloud storage services for unstructured data storage due to their scalability, reliability, and cost-effectiveness. Cloud providers offer a range of storage options, including object storage, file storage, and content delivery networks (CDNs), to accommodate different types of unstructured data.
  • Data Governance and Security: Managing unstructured data requires robust data governance practices to ensure compliance, data security, and privacy protection. This may involve implementing access controls, encryption, data classification, and audit trails to safeguard sensitive information.

Effective storage and unstructured data management are essential for organizations to derive insights, make data-driven decisions, and unlock the value of their data assets.

Unstructured Data Storage Vendors

Many vendors offer solutions for storing unstructured data, each with its own set of features, capabilities, and pricing models. Here are some notable vendors in the unstructured data storage space:

  • Amazon Web Services (AWS): Amazon Simple Storage Service (S3) (AWS S3) is a highly scalable object storage service designed for storing and retrieving any amount of data. It is commonly used for unstructured data storage and offers features such as versioning, lifecycle management, and security features. Learn more about Komprise for AWS.
  • Microsoft Azure: Azure Blob Storage provides scalable, cost-effective storage for unstructured data. It offers tiered storage options, access controls, and integration with other Azure services for data analytics and processing. Learn more about Komprise for Azure.
  • Google Cloud Platform (GCP): Google Cloud Storage is a scalable object storage solution suitable for storing unstructured data. It provides features such as versioning, lifecycle management, and integration with other GCP services. Learn more about Komprise for Google. 
  • IBM: IBM Cloud Object Storage: IBM offers Cloud Object Storage, a scalable, secure, and durable object storage service. It is designed to support large-scale unstructured data storage and offers features such as encryption, access controls, and global data distribution. Learn more about Komprise for IBM.
  • Dell: Dell EMC Isilon, now Dell PowerScale, is a scale-out network-attached storage (NAS) platform designed for storing and managing large volumes of unstructured data. It offers high performance, scalability, and multi-protocol support for various data types. Learn about Komprise Elastic Data Migration for Isilon.
  • NetApp: NetApp StorageGRID is an object storage solution from NetApp that enables organizations to store, manage, and protect unstructured data at scale. It offers features such as geo-distribution, data tiering, and policy-based management. Learn more about Komprise for NetApp.
  • Pure Storage: Pure Storage FlashBlade is a scalable, all-flash storage platform designed for unstructured data workloads. It offers high performance, simplicity, and native support for file, object, and analytics workloads.

Komprise-Pure-Storage-Blog_Resource_Thumbnail_800x533

HPE (Hewlett Packard Enterprise): For years it has been HPE Nimble Storage, which offers a range of storage solutions, including Nimble Storage dHCI and Nimble Storage All Flash Arrays, suitable for storing unstructured data. HPE now resells VAST Data solutions as HPE File Services.

Qumulo: Qumulo’s Scale Anywhere™ platform is a 100% software solution for hybrid enterprises to efficiently store and manage file & object data at the edge, in the core, and in the cloud

These are some examples of vendors providing solutions for unstructured data storage.

Optimize unstructured data storage with Komprise

Komprise Intelligent Data Management frees you to analyze, mobilize, and access the right file and object data across clouds without shackling your data to any unstructured data storage vendor. Komprise helps enterprise customers optimize data storage costs by right-sizing and right-placing
data, while making it easy for users to unlock data value with smart data workflows.

Komprise-Architecture-Page-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Contact | Data Assessment