• A
    • Active Storage

      Active storage is a data storage approach where frequently accessed or frequently changing data is stored on high-performance storage systems that are readily available for immediate use. This method is commonly used in the context of computer systems, databases, and cloud computing.

      The purpose of active storage is to ensure that critical and frequently needed data is quickly accessible, allowing applications and users to retrieve and modify the data with minimal latency. High-speed access to active storage is essential for real-time processing, interactive applications, and any use cases where data needs to be rapidly read or updated.


      In contrast, less frequently accessed or less critical data may be moved to less expensive and slower (aka lower-cost) storage tiers, such as archival storage or cold data storage, where retrieval times are not as critical. This tiered storage approach, often referred to as hierarchical storage management, optimizes data storage costs by storing data on the most suitable and cost-effective storage based on its usage patterns and access requirements.

      Active storage can be implemented using various technologies, including high-performance disk arrays, solid-state drives (SSDs), in-memory databases, and cloud-based storage solutions designed for low-latency access.

      The concept of active storage is closely related to the idea of “hot data” – data that is currently being actively used and requires immediate access. As data usage patterns change over time, data may transition between active storage and other storage tiers based on access frequency, age, or other criteria defined by the data lifecycle management strategy of an organization.

      Case study: Pfizer is saving 75% on storage by using Komprise to analyze and continuously move cold data to Amazon S3 as it ages.


      Getting Started with Komprise:

    • Adaptive Data Management

      What is adaptive unstructured data management?

      As data footprint continues to grow, businesses are struggling to manage petabytes of data, often consisting of billions and billions of files. To manage at this scale, intelligent automation that learns and adapts to your environment is needed.

      Data management needs to happen continuously in the background and not interfere with active usage of storage or the network by users and applications. This is because unstructured data management is an ongoing function, much like a housekeeper of data. Just as you would not want your housekeeper to be clearing dishes as your family is eating at the dinner table, data management needs to run non-intrusively in the background.

      To do this, an adaptive data management solution is needed – one that knows when your file system and network are in active use and throttles itself back, and then speeds back up when resources are available. An adaptive data management system learns from your usage patterns and adapts to the environment.

      In The 10 Principles of Komprise Intelligent Data Management, adaptive data management is summarized this way:

      Komprise throttles back as needed when your data storage or network are in active use, so you never have to monitor or schedule when Komprise runs.


      Getting Started with Komprise:

    • AI Compute

      The computing ability required for machines to learn from big data to experience, adjust to new inputs, and perform human-like tasks. Komprise cuts the data preparation time for AI projects by creating virtual data lakes with its Deep Analytics feature.

      AI compute refers to the computational resources required for artificial intelligence systems to perform tasks, such as processing data, training machine learning models, and making predictions. These resources can be provided by various hardware and software platforms, including GPUs, TPUs, cloud computing, and edge computing devices. The amount of AI compute needed depends on the complexity of the AI system and the amount of data being processed.

      Unstructured data management solutions and unstructured data workflows are increasingly being used to not only ensure greater data storage cost but also to deliver the right data to the right destination at the right time.

      What is unstructured data in AI?

      AI needs unstructured data – are you ready?

      AI needs unstructured data

      Getting Started with Komprise:

    • Air Gap

      An air gap, in the context of computer security, refers to a physical or logical separation between a computer or network and any external or untrusted networks or systems. It is a security measure used to protect sensitive or critical information from unauthorized access or cyber threats.

      The concept behind an air gap is to create a physical or logical barrier that prevents direct communication or data transfer between the protected system and external networks. This isolation helps reduce the risk of malicious actors or malware infiltrating the system and compromising its security.

      Physical and Logical Air Gap

      • Physical air gap: The isolated system is physically disconnected from any external networks, typically by physically unplugging network cables or using dedicated networks that are not connected to the internet or other networks. This is commonly seen in high-security environments or critical infrastructure systems where data protection is of utmost importance.
      • Logical air gap (or virtual air gap): Using network configurations, firewalls, or security controls to create a virtual separation between the protected system and external networks. While the system may still be physically connected to a network, it is isolated in such a way that communication with external systems is restricted or highly regulated.

      Air gaps are commonly employed in situations where highly sensitive or classified data is involved, such as government or military networks, financial systems, or critical infrastructure control systems. However, it is important to note that air gaps are not foolproof and additional security measures should be implemented to address potential risks like insider threats or physical access breaches.

      In the blog post How to Protect File Data from Ransomware at 80 percent Lower Cost, there an overview of how to create affordable cloud ransomware recovery copy that is logically air-gapped.

      If you want to use Komprise for both hot and cold data, Komprise can create an affordable logically isolated recovery copy of all data in an object-locked destination such as Amazon S3 IA, so data is protected even if the backups and primary storage are attacked.

      Getting Started with Komprise:

    • Alternate Data Streams (ADS)

      Alternate Data Streams (ADS) is a feature in the Windows operating system that allows data to be associated and hidden within files. An ADS can be used to store additional information about a file, such as metadata or comments, without changing the file itself.

      Read ADS overview here.

      ADS is a feature that was introduced in the NTFS file system used by Windows, and it allows users to attach a second data stream to a file, which is invisible to most applications and users. The ADS is named using a colon, for example, “myfile.txt:ads.txt”.

      ADS Spyware?

      ADS can be used for legitimate purposes, such as adding metadata to a file, but it can also be used for malicious purposes, such as hiding malware or other sensitive information within a file. As a result, ADS has been used in some types of cyberattacks, such as those involving stealthy data exfiltration or command-and-control communications.

      To view and manage ADS, you can use the Windows command prompt or third-party tools such as ADS Spy or ADS Scanner. It is important to be aware of the existence of ADS and to take appropriate security measures to protect against malicious use of this feature.

      Getting Started with Komprise:

    • Amazon (AWS) S3 Intelligent Tiering

      S3 Intelligent Tiering is an Amazon storage class aimed at data with unknown or unpredictable data access patterns. See our S3 Intelligent Tiering glossary entry for further information.AWS_logo_featured_600x400-1

      Learn more about AWS cloud tiering, cloud data migration and the Komprise AWS partnership.


      Getting Started with Komprise:

    • Amazon FSx

      What is Amazon FSx?

      Amazon FSx is a fully managed service, high-performance file systems in the cloud that runs on AWS.

      Customers can choose between four file systems:

      • NetApp ONTAP
      • OpenZFS
      • Windows File Server
      • Lustre

      Komprise supports for Amazon FSx for NetApp ONTAP with a focus on Smart Data Migration. As an Advanced AWS partner, the Komprise cloud data migration solution is able to “right place” data to reduce costs and increase data value.

      Read the AWS partner press release and blog post Komprise and AWS FSx for Netapp ONTAP.


      For more information on Komprise File Data Migration to the Cloud be sure to check out our Path to the Cloud section of the website and download the Smart Data Migration for AWS white paper.

      Other Resources:

      Getting Started with Komprise:

    • Amazon Glacier (AWS Glacier)


      What is Amazon S3 Glacier (AWS Glacier)?

      Amazon S3 Glacier, also known as AWS Glacier, is a class of cloud storage available through Amazon Web Services (AWS).  Amazon S3 Glacier is a lower-cost storage tier designed for use with data archiving and long-term backup services on the public cloud infrastructure.

      Amazon S3 Glacier was created to house data that doesn’t need to be accessed frequently or quickly. This makes it ideal for use as a cold storage service, hence the inspiration for its name.

      Amazon S3 Glacier retrieval times range from a few minutes to a few hours with three different speed options available: Expedited (1-5 minutes), Standard (3-5 hours), and Bulk (5-12 hours).

      Amazon S3 Glacier Deep Archive offers 12-48-hour retrieval times. The faster retrieval options are significantly more expensive, so having your data organized into the correct tier within AWS cloud storage is an important aspect of keeping storage costs down.

      Other Glacier features:
      • The ability to store an unlimited number of objects and data
      • Data stored in S3 Glacier is dispersed across multiple geographically separated Availability Zones within the AWS region
      • An average annual durability of 99.999999999%
      • Checksum uploads to validate data authenticity
      • REST-based web service
      • Vault, Archive, and Job data models
      • Limit of 1,000 vaults per AWS account

      Main Applications for Amazon S3 Glacier Storage

      There are several scenarios where Glacier is an ideal solution for companies needing a large volume of cloud storage.

      1. Huge data sets. Many companies that perform trend or scientific analysis need a huge amount of storage to be able to house their training, input, and output data for future use.
      2. Replacing legacy storage infrastructure. With the many advantages that cloud-based storage environments have over traditional storage infrastructure, many corporations are opting to use AWS storage to get more out of their data storage systems. AWS Glacier is often used as a replacement for long term tape archives.
      3. Healthcare facilities’ patient data. Patient data needs to be kept for regulatory or compliance requirements. Glacier and Glacier Deep Archive are ideal archiving platforms to keep data that will hardly need to be accessed.
      4. Cold data with long retention times. Finance, Research, Genomics, and Electronic Design Automation and Media, Entertainment are some examples of industries where cold data and inactive projects may need to be retained for long periods of time even though they are not actively used.  AWS Glacier storage classes are a good fit for these types of data.  The project data will need to be recalled before it is actively used to minimize retrieval delays and costs.

      Amazon S3 Glacier vs S3 Standard

      Amazon’s S3 Standard storage and S3 Glacier are different classes of storage designed to handle workloads on the AWS cloud storage platform.

      • S3 Glacier is best for cold data that’s rarely or never accessed
      • Amazon S3 Standard storage is intended for hot and warm data that needs to be accessed daily and quickly

      The speed and accessibility of S3 Standard storage comes at a much higher cost compared to S3 Glacier and the even more economical S3 Glacier Deep Archive storage tiers. Having the right data management solution is critical to help you identify and organize your hot and cold data into the correct storage tiers, saving a substantial amount on storage costs.

      Benefits of a Data Management System to Optimize Amazon S3 Glacier

      migrationisvpartner-150x150A comprehensive suite of unstructured data management and unstructured data migration capabilities allow organizations to reduce their data storage footprint and substantially cut their storage costs. These are a few of the benefits of integrating an analytics-driven data management solution like Komprise Intelligent Data Management with your AWS storage:

      Get full visibility of your AWS and other storage data

      Across AWS and other cloud platforms to understand how much NAS data is being accrued and whether it’s hot or cold so you make better data storage investment and data mobility decisions.

      Intelligent tiering and life cycle management for AWS storage

      Optimize and improve how you manage files and objects across EFS, FSX, S3 Standard and S3 Glacier storage classes based on access patterns.

      Intelligent AWS data retrievals

      Don’t get hit with unexpected data retrieval fees on S3 Glacier – Komprise enables intelligent recalls based on access patterns so if an object on Glacier becomes active again, Komprise will move it up to an S3 storage class.

      Bulk retrievals for improved AWS user performance

      Improve performance across entire projects from S3 Glacier storage classes – if an archived project is going to become active, you can prefetch and retrieve the entire project from S3 Glacier using Komprise so users don’t have to face long latencies to get access to the data they need.

      Minimize AWS storage costs

      With analytics-driven cloud data management that monitors retrieval costs, egress costs and other costs to minimize them by promoting data up and recalling it intelligently to more active storage classes.

      Access AWS data natively

      Access data that has been moved across AWS as objects from Amazon S3 storage classes or as files from File and NAS storage classes without the need for additional stubs or agents.

      Reduce AWS cloud storage complexity

      Reduce the complexity of your cloud storage and NAS environment and manage your data more easily through an intuitive dashboard.

      Optimize the AWS storage savings

      Komprise Intelligent Data Management allows you to better manage all the complex data storage, retrieval, egress and other costs. Know first. Move smart. Take control.

      Easy, on-demand scalability

      Komprise provides you with the capacity to add and manage petabytes without limits or the need for dedicated infrastructure.

      Integrate data lifecycle management

      Integrate easily with an AWS Advanced Tier partner such as Komprise for lifecycle management or other use cases.

      Move data transparently to any tier within AWS

      Your users won’t experience any difference in terms of data access. You’ll notice a huge difference in cost savings and unstructured data value with Komprise.

      Create automated data management policies and data workflows

      Continuously manage the lifecycle of the moved data for maximum savings. Build Smart Data Workflows to deliver the right data to the right teams, applications, cloud services, AI/ML engines, etc. at the right time.

      Streamline Amazon S3 Glacier Operations with Komprise Intelligent Data Management

      Komprise’s Intelligent Data Management allows you to seamlessly analyze and manage data across all of your AWS cloud storage classes so you can move data across file, S3 Standard and S3 Glacier storage classes at the right time for the best price/performance. Because it’s vendor agnostic, its standards-driven analytics and data management work with  the largest storage providers in the industry and have helped companies save up to 50% on their cloud storage costs.

      If you’re looking to get more out of your AWS storage, contact a data management expert at Komprise today and see how much you could save on data storage costs. Read the white paper: Smart Data Migration for AWS.


      Getting Started with Komprise:

    • Amazon S3 (AWS S3)

      Amazon Simple Storage Service, known as Amazon S3 or AWS S3, is an object storage service that offers industry-leading scalability, data availability, security, and performance.

      See S3 in our glossary for further information.

      Learn more about Komprise Intelligent Data Management for AWS data storage.


      Getting Started with Komprise:

    • Amazon S3 Glacier Instant Retrieval

      Amazon S3 Glacier Instant Retrieval is an archive storage class that was introduced in November, 2021. According to Amazon, it delivers the lowest-cost archive storage with milliseconds retrieval for rarely accessed data.

      Komprise works closely with AWS to ensure enterprise customers have visibility into data across storage environments. With analytics-driven unstructured data management, Komprise right places data to the right storage class: Hot data on high performance managed file services in AWS and cold data on lower cost Amazon S3 Glacier object storage such as Amazon S3 Glacier Instant Retrieval and Amazon S3 Infrequent Access.

      Learn more about Amazon S3 Storage Classes.

      Learn more about Komprise for AWS.

      Getting Started with Komprise:

    • Amazon Tiering

      What is Amazon Tiering?

      Amazon Web Services (AWS) offers several storage services that support data tiering based on different storage classes. These data storage classes allow customers to optimize their storage costs and performance by choosing the most suitable option for their data based on its access patterns and durability requirements.

      Learn more about Komprise file and object data migration, data tiering and ongoing data management.

      AWS Storage Tiering Options

      Amazon S3 Storage Classes: Amazon Simple Storage Service (S3) provides multiple storage classes to accommodate different data access patterns and cost requirements:

      • Standard: This is the default storage class for S3 and offers high durability, availability, and performance for frequently accessed data.
      • Intelligent-Tiering: This storage class automatically moves objects between two access tiers (frequent access and infrequent access) based on their usage patterns. It optimizes costs by automatically transitioning objects to the most cost-effective tier.
      • Standard-IA (Infrequent Access): This storage class is suitable for data that is accessed less frequently but still requires rapid access when needed. It offers lower storage costs compared to the Standard class.
      • One Zone-IA: Similar to Standard-IA, but the data is stored in a single Availability Zone, which provides a lower-cost option for customers who don’t require data redundancy across multiple zones.
      • Glacier, Glacier IT and Glacier Deep Archive: These storage classes are designed for long-term archival and data retention. Data stored in Amazon S3 Glacier is accessible within minutes to hours, while Glacier Deep Archive is for data with retrieval times of 12 hours or more.

      Amazon EBS Volume Types: Amazon Elastic Block Store (EBS) provides different volume types for block storage in AWS. While not strictly tiering, these volume types offer varying performance characteristics and costs:

      • General Purpose SSD (gp2): This is the default EBS volume type and provides a balance of price and performance for a wide range of workloads.
      • Provisioned IOPS SSD (io1/io2): These volume types are designed for applications that require high I/O performance and consistent low-latency access to data.
      • Throughput Optimized HDD (st1): This volume type offers low-cost storage optimized for large, sequential workloads that require high throughput.
      • Cold HDD (sc1): This volume type provides the lowest-cost storage for infrequently accessed workloads with large amounts of data.

      Amazon S3 Glacier and Glacier Deep Archive: These are the storage classes within Amazon S3 designed specifically for long-term data archival and retention. The retrieval times are longer compared to other storage classes, but they offer significantly lower storage costs for data that is rarely accessed.

      Amazon tiering options are designed to help AWS customers effectively manage their data storage costs and performance based on the specific requirements of their workloads and data access patterns.

      Komprise Intelligent Data Management for AWS

      Komprise is an AWS Migration and Modernization competency partner, working closely with AWS teams to follow best practices and support cloud data storage services including Amazon EFS, Amazon FSx and Amazon S3 (including Amazon S3 Glacier Flexible Retrieval and Glacier Instant Retrieval storage classes). The Komprise analytics-driven SaaS platform allows customers to analyze, mobilize and manage their file and object data using AWS allowing enterprise customers to:

      • Understand AWS NAS & Object Data Usage and Growth
      • Estimate ROI of AWS Data Storage
      • Migrate Smarter to Amazon FSx for NetApp ONTAP
      • Easily Integrate AWS Data Lifecycle Management
      • Access Moved Data as Files Without Stubs or Agents
      • Gain Native Data Access in the AWS Cloud Without Storage Vendor Lock-In
      • Rapidly Migrate Object Data Into AWS Storage
      • Reduce AWS Unstructured Data Complexity
      • Scale On-Demand with Modern, SaaS Architecture


      Getting Started with Komprise:

    • Analytics-driven Data Management

      Analytics-driven data management is a core principle of the standard-based platform of Komprise Intelligent Data Management that’s based on data insight and automation to strategically and efficiently manage and move unstructured data at massive scale. With Komprise, you can know first, move smart, and take control of massive unstructured data growth while cutting 70% of your enterprise data storage costs, including backup and cloud costs.


      Know First: Get insight into your data before you invest. See across your data storage silos, vendors, and clouds to make informed storage and backup decisions.

      • Analyze any NAS, S3
      • Plan and project storage cost savings
      • Search, tag, build virtual data lakes with a global file index

      Cloud-Migration-3@3x-400x400Move Smart: Ensure the right data is in the right place at the right time. Establish analytics-driven policies to manage data based on its need, usage, and value.

      Deliver-Value-3@3x-400x400Take Control: Get back to the business at hand while reducing your storage, backup, and cloud costs and get the fastest, easiest path to the cloud for your file and object data.

      • Ensure you have data mobility and avoid storage-vendor lock-in
      • Open, standards-based platform
      • Native cloud access

      Read the Komprise Architecture Overview white paper.


      Getting Started with Komprise:

    • Application Programming Interface (API)

      What is an API?

      An Application Programming Interface (API) is a set of protocols, routines, and tools for building software applications. APIs define how software components should interact with each other, providing a standard way for developers to create programs that can access services or data provided by other software components or systems.

      APIs allow developers to access services or data without needing to understand how those services or data are implemented. Instead, they can use the API’s predefined set of functions and methods to interact with the service or data. This makes it easier and faster for developers to create new applications that can leverage existing services and data sources.

      APIs are often used to connect different software components or systems, such as web applications or mobile apps to backend servers or databases. They can also be used to integrate different software tools, enabling them to work together seamlessly.

      APIs can be public or private, depending on whether they are available for external developers to use or are restricted to use within a specific organization or system. Many public APIs are available from companies such as Google, Amazon, and Twitter, which provide access to their services and data for developers to build applications on top of.

      APIs are an essential tool for modern software development, enabling developers to build complex and powerful applications quickly and efficiently by leveraging existing services and data sources.

      API-Driven Data Management and Data Migration

      Komprise Smart Data Workflows can enrich data by allowing the execution of external functions or cloud services either at the edge, datacenter or cloud and then tagging data with metadata. Examples include: Snowflake, Amazon Macie, Azure machine learning.

      Read the blog post

      Read the AWS blog: Using Amazon Macie with Komprise for Detecting Sensitive Content in On-Premises Data


      Komprise Elastic Data Migration is both UI and API driven. Here are is an example of a hospital group who used the Komprise API to migrate petabytes of SMB files from EMC Isilon access zones to Qumulo. Komprise set up 400+ migration jobs via scripting using the APIs and migrated 278 million SMB files spanning nearly 1500 shares. Because of the number of shares and folders in the environment it was unrealistic to set up migrations one at a time via the UI, which led to Komprise recommending the API approach.

      Read the blog post: 5 Industry Data Migration Use Case

      Getting Started with Komprise:

    • Archival Storage

      What is Archival Storage?

      Archival Storage is a source for data that is not needed for an organization’s everyday operations, but may have to be accessed occasionally.

      By utilizing an archival storage, organizations can leverage to secondary sources, while still maintaining the protection of the data.

      Utilizing archival storage sources reduces primary storage costs required and allows an organization to maintain data that may be required for regulatory or other requirements.

      Data archiving, also known as data tiering, is intended to protect older information that is not needed for everyday operations, but may have to be accessed occasionally. Data Archival and Tiering storage is a tool for reducing your primary storage need and the related costs, rather than acting as a data recovery tool.

      solutions_that_archiveWhy Archival Storage?

      • Some data archives allow data to be read-only to protect it from modification, while other data archiving products treat data as to allow users to modify it.
      • The benefit of data archiving is that it reduces the cost of primary storage. Alternatively, archive storage costs less because it is typically based on a low-performance, high-capacity storage medium.
      • Data archiving takes a number of different forms. Options can be online data storage, which places archive data onto disk systems where it is readily accessible. Archives are frequently file-based, but object storage is also growing in popularity. A key challenge when using object storage to archive file-based data is the impact it can have on users and applications. To avoid changing paradigms from file to object and breaking user and application access, use data management solutions that provide a file interface to data that is archived as objects.
      • Another archival system uses offline data storage where archive data is written to tape or other removable media using data archiving software rather than being kept online. Data archiving on tape consumes less power than disk systems, translating to lower costs.
      • A third option is using cloud data storage, such as those offered by Amazon and Microsoft Azure – this can be less expensive if done right, but requires ongoing investment. A Smart Data Migration strategy is essential.
      • The data archiving process typically uses automated software, which will automatically move “cold” data via policies set by an administrator. Today, a popular approach to data archiving is to make the archive “transparent” – so the archived data is not only online but the archived data is fully accessed exactly as before by users and applications, so they experience no change in behavior. The patented Komprise Transparent Move Technology is designed to allow you to transparently archive and tier data.

      Getting Started with Komprise:

    • Archiving

      Archiving, in the context of technology and unstructured data management (also see Data Archiving), is the process of storing and preserving data in a systematic and organized manner for long-term retention. It involves moving data from active or primary storage locations to secondary storage systems or media, with the goal of freeing up primary storage space while ensuring data is securely preserved for future reference. Additionally, rising data storage costs, unstructured data growth, data sprawl, data center consolidation, cloud migration and new approaches to data tiering are all drivers of modern data archiving strategies.

      When data is archived, it is typically less frequently accessed or modified compared to active data. Archiving allows organizations to manage data growth, improve system performance, and maintain compliance with data retention policies and legal requirements.

      Key points to understand about archiving

      The primary purpose of archiving is to retain data that is no longer actively used but may still hold value for reference, regulatory compliance, legal reasons, or historical purposes. Archiving helps organizations maintain data integrity and accessibility while optimizing primary storage performance and resources.

      • Thumbnail_600x400_CCC7pitfalls-30x20Data Selection: The process of archiving involves identifying and selecting data to be moved from primary storage to secondary storage. Organizations define criteria for data selection, such as age, usage patterns, relevance, or specific retention policies, to determine which data should be archived.
      • Storage Systems or Media: Archived data is typically stored on secondary storage systems or media that provide cost-effective and scalable storage options. These may include network-attached storage (NAS), tape libraries, cloud storage, or dedicated archival storage solutions. The choice of storage medium depends on factors like data volume, access requirements, retention policies, and budget considerations.
      • Indexing and Metadata: Effective archiving involves organizing and indexing the archived data to enable efficient retrieval. Indexing involves creating a catalog or database that records relevant metadata about the archived items, such as file names, dates, file types, and other attributes. This helps in locating and retrieving specific data when needed. See Global File Index.
      • Data Security and Integrity: Data security and integrity are crucial aspects of archiving. Archived data should be protected from unauthorized access, loss, or corruption. Encryption, access controls, regular backups, and data integrity checks are implemented to ensure the security and reliability of archived data.
      • Retrieval and Access: Although archived data is stored in secondary storage, it should still be easily accessible when required. Organizations establish data retrieval mechanisms, search capabilities, and access controls to locate and retrieve specific archived data efficiently. This may involve using search indexes, metadata filters, or specialized archival software.

      Archiving practices may vary depending on the specific requirements and industry regulations. Organizations often develop archiving policies and procedures to govern the storage, retention, retrieval, and disposal of archived data, ensuring compliance, data governance, and efficient data management, and more specifically unstructured data management, practices.

      Getting Started with Komprise:

    • Artificial Intelligence (AI)

      Artificial Intelligence (AI) is the simulation of human intelligence in machines that are programmed to perform tasks that would typically require human intelligence such as visual perception, speech recognition, decision-making, and language translation. AI involves the development of computer systems capable of performing these tasks.

      AI subfields

      AI subfields employ different techniques and algorithms to enable machines to learn from data, recognize patterns, make predictions, and solve complex problems. Examples include:

      • Machine learning: a prominent branch of AI, focuses on enabling machines to learn from and adapt to data without explicit programming. It involves the development of algorithms that allow computers to analyze and interpret large volumes of data, identify patterns, and make informed decisions or predictions.
      • Natural language processing (NLP): Deals with enabling machines to understand, interpret, and generate human language. NLP plays a crucial role in applications such as speech recognition, language translation, chatbots, and text analysis.
      • Computer vision: Involves enabling machines to interpret and understand visual information from images or videos. It enables systems to perceive and analyze visual data, such as object recognition, image classification, and autonomous driving.
      • Robotics, expert systems and more.

      AI has a wide range of applications across various industries, including finance, healthcare, transportation, manufacturing and entertainment. It has the potential to revolutionize industries, improve efficiency, automate processes, and solve complex problems.

      AI is still an evolving field, and while it has made significant advancements, it is not yet capable of replicating the full spectrum of human intelligence. Researchers and developers continue to explore and push the boundaries of AI, striving to create more advanced and sophisticated systems. There is an ongoing discussion about the important role of regulation and governance, especially as they relate to generative AI. The leaders of OpenAI have proposed an international regulatory body.

      AI needs unstructured data

      At the end of 2022, Komprise CEO Kumar Goswami wrote about the importance of unstructured data and unstructured data management to AI and machine learning. He wrote:

      Enterprises need to be ready for this wave of change and it starts by getting unstructured data prepped, as this data is the critical ingredient for AI/ML. This entails new data management strategies which create automated ways to index, segment, curate, tag and move unstructured data continuously to feed AI and ML tools. Unforeseen changes to society, fueled by AI, are coming soon and you don’t want to be caught flat-footed.

      In 2023 he wrote an article entitled: The AI/ML Revolution: Data Management Must Evolve.

      Getting Started with Komprise:

    • AWS DataSync

      AWS DataSync is an online service that moves data between on premises and AWS Storage services. According to AWS, DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSz for OpenZFS file systems, and Amazon FSx for NetApp ONTAP file systems.

      Point tools vs. platform

      Cloud migration of file data can be complex, labor-intensive, costly and time-consuming. Understanding your migration options is essential. Generally they are as follows:

      • Free Tools: Good for tactical use cases, but often require a lot of hand-holding. Data migration reliability and performance are concerns.
      • Point Data Migration Solutions: Usually older vendors who have a professional-services-centric approach. Watch out for difficult to set up and use technologies with legacy architectures, which will present user disruption and scalability challenges.
      • Komprise Elastic Data Migration: Makes cloud data migrations simple, fast, reliable and eliminates sunk costs since you continue to use Komprise after the migration. Komprise is the only solution that gives you the option to cut 70%+ cloud storage costs by placing cold data in Object classes while maintaining file metadata so it can be promoted in the cloud as files when needed.


      Learn more about Komprise for AWS.


      Getting Started with Komprise:

    • AWS Lambda

      What is AWS Lambda?

      AWS Lambda is a serverless, event-driven compute service provided by Amazon Web Services (AWS) that allows developers to run code without managing servers or infrastructure. AWS Lambda provides a scalable, flexible, and cost-effective way to run code in response to events, such as changes to data in an Amazon S3 bucket or an update to a DynamoDB table.

      With AWS Lambda, developers can write code in a variety of programming languages, including Python, Java, C#, and Node.js. They can then upload this code to AWS Lambda, where it is executed in response to events triggered by other AWS services, such as Amazon S3, DynamoDB, or API Gateway.

      AWS Lambda automatically scales the number of instances needed to handle incoming requests, and developers only pay for the compute time they consume, which makes it a cost-effective option for many use cases. AWS Lambda also provides built-in monitoring and logging capabilities, making it easy for developers to monitor the performance and behavior of their functions.

      One of the key benefits of AWS Lambda is its serverless architecture, which eliminates the need for developers to manage infrastructure, allowing them to focus on writing code and building applications. This makes it easier and faster to develop and deploy new applications, as well as reducing the cost and complexity of managing infrastructure.

      AWS Lambda is a powerful and flexible tool for building serverless applications and integrating them with other AWS services. Its scalability, cost-effectiveness, and ease of use make it a popular choice for developers looking to build modern, cloud-native applications.

      AWS Lambda Data Management

      AWS Lambda provides several options for managing data within your functions, including:

      • Environment Variables: Environment variables can be used to store configuration data, such as API keys or database connection strings, that are used by your function. These variables can be set and managed within the AWS Lambda console or via the AWS CLI.
      • Local File Storage: AWS Lambda provides a temporary storage area for your function to read and write files during execution. This storage area is deleted when the function completes, so it should not be used for permanent data storage.
      • AWS Services: AWS Lambda can interact with various AWS services, including Amazon S3, Amazon DynamoDB, and Amazon RDS. These services provide persistent storage options for your data that can be accessed by multiple functions.
      • External Services: AWS Lambda functions can also interact with external services, such as third-party APIs or databases. These services can be accessed via the internet or through a virtual private network (VPN) connection.

      Learn more about AWS Lambda.

      Learn more about Komprise and AWS unstructured data migration and data management.


      Getting Started with Komprise:

    • AWS Snowball

      What is AWS Snowball Edge?

      AWS Snowball Edge is a hardware appliance used to migrate petabyte-scale data into and out of Amazon S3, mitigating issues with large-scale data transfers including high network costs, limited connectivity such as in remote locations, long transfer times, and security concerns. Beyond data transfer and cloud data migration use cases, the Snowball Edge device features on-board storage and compute power to enable local processing and analytics at the edge. Once transferred into AWS S3, an organization can move the data into other storage classes as needed.

      Snowball appliances are shipped to the customer and deployed on the customer’s network. Data is copied to the Snowball appliance and then return shipped to AWS where the data is copied to the appropriate AWS storage tier and made available for access.

      According to Hackernoon, Snowball Edge has been used in oil rigs, with the U.S. Department of Defense, and in an emergency situation for the U.S. Geological Survey needing to quickly export data from its data center during a volcanic eruption.

      Considerations for AWS Snowball Edge

      Enterprises have two options for AWS Snowball:

      • AWS Snowball Edge Storage Optimized devices provide both block storage and Amazon S3-compatible object storage, and 40 vCPUs. They are well suited for local storage and large scale-data transfer. It’s possible to combine up to 12 devices together and create a single S3-compatible bucket that can store nearly 1 petabyte of data.
      • Snowball Edge Compute Optimized devices provide 52 vCPUs, block and object storage, and an optional GPU for use cases including machine learning and full motion video analysis.
      • Snowball supports specific Amazon EC2 instance types and AWS Lambda functions, so you can develop and test in the AWS Cloud, then deploy applications on devices in remote locations to collect, pre-process, and ship the data to AWS.
      • Snowball can transport multiple terabytes of data and multiple devices can be used in parallel or clustered together to transfer petabytes of data into or out of AWS.

      Cloud Tiering to AWS

      By using Komprise for cloud tiering to AWS, you can save not only on your on-premises storage but also on your cloud costs. Users get transparent access to the files moved by Komprise from the original location, and with Komprise moving data in native format, you can give users direct, cloud-native access to data in AWS while eliminating egress fees and rehydration hassles.

      Learn more about the benefits of moving data in cloud native format.

      Smart Data Migration for AWS

      smart-file-data-migration-aws-thumbA smart data migration strategy for enterprise file data means an analytics-first approach ensuring you know which data can migrate, to which class and tier, and which data should stay on-premises in your hybrid cloud storage infrastructure. This paper introduces the benefits of a smart data migration strategy for file workloads to AWS cloud storage services. Komprise and AWS enable your organization to:

      • Understand your NAS & object data usage and growth.
      • Estimate the ROI of AWS storage in your environment.
      • Migrate smarter to Amazon FSx for NetApp ONTAP.
      • Access moved data as files without stubs or agents.
      • Reduce complexity and scale on-demand.
      • Deliver native data access in the cloud without lock-in.
      Read the white paper: Smart Unstructured Data Migration for AWS
      Learn more about your Cloud Tiering choices.
      Learn more about Komprise for AWS.

      Getting Started with Komprise:

    • AWS Storage

      What is AWS Cloud Storage?

      The AWS cloud service has a full range of options for individuals and enterprises to store, access and analyze data. AWS offers options across all three types of cloud data storage object storage, file storage and block storage.

      Here are the Amazon StorageAWS Storage choices:

      • Amazon Simple Storage Service (S3): S3 is a popular AWS service that provides scalable and highly durable object storage in the cloud.
      • AWS Glacier: Glacier provides low-cost highly durable archive storage in the cloud. It’s best for cold data as access times can be slow.
      • Amazon Elastic File System (Amazon EFS): EFS provides scalable network file storage for Amazon EC2 instances.
      • Amazon Elastic Block Store (Amazon EBS): This service provides low-latency block storage volumes for Amazon EC2 instances.
      • Amazon EC2 Instance Storage. An instance store is ideal for temporary storage of information that changes frequently, such as buffers, caches and scratch data, and consists of one or more instance store volumes exposed as block devices.
      • AWS Storage Gateway. This is a hybrid storage option that integrates on-premises storage with cloud storage. It can be hosted on a physical or virtual server.
      • AWS Snowball. This data migration service transports large amounts of data to and from the cloud and includes an appliance that’s installed in the on-premises data center.


      Each of these Amazon storage classes has several tiers at different price points – so it is important to put the right data in the right storage class at the right time to optimize price and performance.

      Komprise Intelligent Data Management for AWS Storage

      Komprise helps organizations get more value from their AWS storage investments while protecting data assets for future use through analysis and intelligent data migration and cloud data tiering.


      Learn more at Komprise for AWS.

      Getting Started with Komprise:

    • Azure Data Box

      What is Azure Data Box?

      Microsoft Azure Data Box is a hardware appliance designed to allow customers to import or export large amounts of data—more than 40TB— into and out of Azure offline. It is especially helpful when there is zero or limited network connectivity. Microsoft ships customers a proprietary Data Box storage device with a rugged casing to protect and secure data during the transit. A customer may choose Data Box for a one-time or the occasional cloud migration or an initial bulk data transfer followed by periodic transfers.

      Microsoft also promotes the Data Box as a solution for exporting data from Azure back on-premises for disaster recovery or other needs or to move to another cloud service provider.


      There are three different types of physical Data Box solutions based on data size:

      • Data Box: This device has 100TB capacity and uses standard NAS protocols and common copy tools. It features AES 256-bit encryption for safer transit.
      • Data Box Heavy: This larger device is designed to lift 1PB of data to the cloud.
      • Data Box Discs: Discs have capacity of 8TB SSD with a USB/SATA interface featuring 128-bit encryption. Customers can buy in packs of up to five for a total of 40TB.

      Considerations for Cloud Migrations Using Azure Data Box 

      Azure Data Box is a good solution to consider if online data transfer is not possible either because the network bandwidth is limited or because it can take too long. But offline transfers can be very tedious and error prone if done manually. Choosing what data to migrate, moving the data into Azure Data Box, and then ensuring the data lands in the cloud can be time consuming to manage. Managing access control and security of file data, and ensuring transfer of all metadata and permissions of files can be very tedious. Often, enterprises want to move some file data to the cloud and keep the rest on-premises. In such situations, using Azure Data Box manually without any automation becomes even more tricky because it can disrupt users and applications.

      Azure Data Box Gateway for Inline Data Transfers

      Azure also offers a virtual appliance called Azure Data Box Gateway that resides on-premises and enables customers to write data to it using NFS and SMB protocols. The device then transfers the data to Azure block, Blob, or Azure File. But Azure Data Box gateway has several limitations and can be used only for very small amounts of data in limited circumstances. See full set of limitations here.

      Komprise allows you to migrate large amounts of data reliably and effortlessly to Azure using its patented Elastic Data Migration, which is 27 times faster than alternatives. You can also use Komprise to transparently tier data to Azure. Tiering cold data is a great way to offload 80% of your data to the cloud without any disruption to users and applications. 

      By using Komprise for cloud tiering to Azure, you can save not only on your on-premises storage but also on your cloud costs since you do not have to tier to Azure Files, you can tier directly to Azure Blob. Users get transparent access to the files moved by Komprise from the original location, and with Komprise moving data in native format, you can give users direct, cloud-native access to data in Azure while eliminating egress fees and rehydration hassles. 

      Learn more about your Cloud Tiering choices 

      Learn more about Komprise for Microsoft Azure.

      Getting Started with Komprise:

    • Azure NetApp Files

      What is Azure NetApp Files?

      Azure NetApp Files is a cloud-based file storage service offered by Microsoft Azure that enables enterprise-grade file shares to be created and managed in the cloud. The service is built on NetApp’s technology and is designed to meet the high-performance, availability, and scalability requirements of enterprise file data workloads.

      Azure NetApp Files provides a fully managed service that allows customers to deploy and manage high-performance file shares in Azure. It offers features such as NFS and SMB protocol support, file share snapshots, and data replication across Azure regions. Customers can also choose from different performance tiers and capacity sizes to optimize the cost and performance of their file shares.

      Azure NetApp Files is commonly used for use cases such as database file shares, big data analytics, media and entertainment workloads, and high-performance computing. It provides a scalable, high-performance, and highly available solution for enterprise customers who need to store and manage large amounts of file data in the cloud.

      Azure NetApp Files Data Management

      Komprise first announced support for Azure NetApp Files in 2020:

      By using Komprise Intelligent Data Management, customers can migrate file workloads to the cloud more than 27 times faster than with other solutions. They can also reduce cloud NAS by 70 percent by transparently archiving cold data from Azure NetApp Files to various Azure Blob storage classes. Komprise’s Transparent Move Technology™ (TMT) enables archived data to be viewed as files, native objects, or both. These new capabilities now allow Komprise to deliver the same on-premises NAS data management features to cloud-enabled NAS.

      Read the white paper: Accelerate Cloud and NAS Migrations to NetApp CVO and Azure NetApp Files (ANF)

      Learn more about Komprise for Azure.

      Learn more about Komprise for NetApp.


      Getting Started with Komprise:

    • Azure Storage

      What is Azure Storage?

      Microsoft Azure hosts a complete array of cloud data storage options to meet the diverse data needs of enterprises today, including backup, tiering, data lakes, structured and unstructured data management. Azure Storage Services include:

      • Azure Blobs: This is a scalable object store best suited for storing and accessing unstructured data and to support analytics and data lake projects.
      • Azure Files: File shares for cloud or on-premises deployments that you can access through the Server Message Block (SMB) protocol.
      • Azure Queues: Allows for asynchronous message between application components.
      • Azure Tables: A NoSQL solution for schema-less storage of structured data.
      • Azure Disks: Allows data to be persistently stored in blocks and accessed from an attached virtual hard disk.
      • Azure Data Lake Storage: A storage platform for ingestion, processing, and visualization that supports common analytics frameworks and provides automatic geo-replication.

      Greater Azure Storage Savings and Value with Komprise

      Komprise helps organizations get the most value from their Azure storage investments while protecting data assets for future use through analysis and intelligent data migration and cloud data tiering.


      Learn more at Komprise for Azure file and object data management and migration.


      Getting Started with Komprise:

    • Azure Tiering

      What is Azure Tiering?

      Azure Storage offers several classes of cloud data storage for customers. However, to maximize savings and ROI from the cloud, IT directors need to consider tiering strategies. Cloud tiering moves less frequently used data, also known as cold data, from expensive on-premises file storage or Network Attached Storage (NAS) or cloud file storage such as Azure Files to cheaper levels of storage in the cloud, typically object storage classes aka Azure Blob storage. 

      Cloud tiering enables data to move across different storage tiers – and different cloud tiering solutions support different storage options. We will cover both the storage tiers in the Azure cloud and the options available to do cloud tiering for Azure.

      Azure Files and Azure Blob have different tiers of storage at different price points:

      Azure Files is Microsoft’s file storage solution for the cloud. As with all file storage solutions, it is more expensive than object storage solutions such as Azure Blob, especially when you add the required replication and data protection costs for files. Azure File Storage Hot tier is more than 1.9 times more expensive than Azure Blob Cool. 

      Azure Files supports two storage tiers: Standard and Premium.

      • Standard file shares are created in general purpose (GPv1 or GPv2) storage accounts; 
      • Premium file shares are created in FileStorage storage accounts.

      What is Azure Blob?

      Azure Blob is Microsoft’s object storage solution for the cloud

      Azure Blob storage is optimized for storing massive amounts of unstructured data. It’s enabled for the following access tiers:

      • Hot: storing data that is accessed frequently.
      • Cool: storing data that is infrequently accessed and stored for at least 30 days.
      • Archive: storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements

      According to Microsoft:

      “You can upload data to your required access tier and change the blob access tier among the hot, cool, or archive tiers as usage patterns change, without having to move data between accounts. All tier change requests happen immediately and tier changes between hot and cool are instantaneous.”

      What is Azure File Sync?

      Azure Files has a service called Azure File Sync which enables an on-premises Windows Server to do cloud tiering to file storage in the cloud, not object storage. 

      Azure File Sync acts as a gateway that caches data locally and puts cold file objects in Azure File cloud storage. When enabled, Azure Files Sync stores hot files on the local Windows server while cool or cold files are split into namespace (file and folder structure) and file content. The namespace is stored locally, and the file content is stored in an Azure file share in the cloud. Azure will automatically tier cold data based on volume or age thresholds. See Microsoft Cloud Tiering overview.

      Considerations for Microsoft Azure Cloud TieringCold-Data-Tiering

      Cloud tiering can save organizations up to 70% on on-premises storage costs when done correctly. But there are several limitations of Azure Cloud Tiering that you need to consider:

      Azure File Sync only tiers to Azure Files and leads to higher cloud costs.

      Azure Files is a file service in Azure and it is almost double the cost of the Azure Blob Cool tier. Since file storage is not resilient, data on Azure Files most commonly needs replication, snapshots and backups – leading to higher data management costs. An ideal cloud tiering solution should tier files from your NAS to an object storage environment to maximize savings. Otherwise, you are paying for higher costs in the cloud.  

      Azure File Sync only tiers blocks of data to the cloud and leads to 75% higher cloud egress costs.

      This means you cannot directly access your files in Azure; you have to go through the on-premises Windows Server to get your data. This leads to 75% higher cloud egress costs, and it limits the use of your data in the cloud. To learn more about the differences between block tiering and file tiering, read our block-level tiering vs file-level tiering white paper to learn more. For an analysis of the cloud egress costs of solutions like Azure File Sync Cloud Tiering, read the Cloud Tiering whitepaper.

      Azure File Sync is only available on Windows Server environments.

      Most organizations today have multiple file server and NAS environments. Using a different tiering strategy for each environment is tedious, error prone, and difficult to manage. Consider an unstructured data management solution that works across your multiple storage vendor environments and transparently tiers and archives data.

      Komprise enables enterprise IT organizations to quickly analyze data and make smart decisions on where data should live based on age, usage and other requirements. Komprise works across your multi-vendor NAS and object environments and clouds via standard protocols such as NFS, SMB and object. By using Komprise for cloud tiering to Azure, you can save not only on your on-premises storage but also on your cloud costs since you do not have to tier to Azure Files, you can tier directly to Azure Blob. Users get transparent access to the files moved by Komprise from the original location, and with Komprise moving data in native format, you can give users direct, cloud-native access to data in Azure while eliminating egress costs and data rehydration hassles. 

      Learn more about your Cloud Tiering choices 

      Learn more about Komprise for Microsoft Azure

      Komprise Smart Data Migration for Azure. Smarter. Faster. Proven.

      Getting Started with Komprise:

  • B
    • Backup

      Backup (also see Data Backup) is the process of creating copies of data to protect against loss or damage. It involves making duplicate copies of important files, databases, applications, or entire systems, which can be used to restore the data in the event of a disaster, hardware failure, human error, or other unforeseen circumstances.


      Key points about backups:

      • Data Protection: The primary purpose of backups is to safeguard data and ensure its availability even in the face of data loss incidents. Backups serve as a safety net, allowing organizations and individuals to recover lost or corrupted data and resume normal operations.
      • Backup Frequency: The frequency of backups depends on various factors, such as the criticality of the data, the rate of data change, and the desired recovery point objective (RPO). RPO determines the maximum acceptable amount of data loss in the event of a failure. Organizations may choose to perform backups daily, weekly, or in more frequent intervals based on their needs.
      • Full and Incremental Backups: Different backup strategies can be employed, such as full and incremental backups. A full backup involves copying all data from the source to the backup storage. Incremental backups only copy the changes made since the last backup, resulting in smaller backup sizes and faster backups. A combination of full and incremental backups can provide a balance between data protection and storage efficiency.
      • Backup Storage: Backups are stored on separate storage devices or media from the original data. This ensures that if the primary storage fails or becomes inaccessible, the backups remain unaffected. Common backup storage options include external hard drives, network-attached storage (NAS), tape drives, cloud storage, or off-site backup facilities.
      • Data Recovery: When data loss occurs, backups are used to restore the lost or corrupted data. The recovery process involves retrieving the backup data and copying it back to the original or alternative locations. Depending on the backup strategy employed, recovery may involve restoring the latest full backup followed by incremental backups or directly restoring the most recent backup.
      • Testing and Verification: It is important to regularly test backups and verify their integrity to ensure they are usable when needed. Regular restore tests help identify any issues or discrepancies in the backup data or the recovery process. Verification involves performing integrity checks on the backup files to ensure they are not corrupted or damaged.

      Backup practices will vary depending on the scale of data, business requirements, and compliance regulations. Be sure to follow best practices, including having multiple copies of backups, storing backups off-site or in the cloud for disaster recovery, and regularly reviewing and updating backup strategies to align with changing data needs and technologies.


      Many backup vendors talk about data management for the data they are backing up. Komprise is a data agnostic unstructured data management solution. Komprise partners with backup vendors and allow customers to know first, move smart and take control of file and object data with an analytics-driven Intelligent Data Management platform as a service.

      Getting Started with Komprise:

    • Block Storage

      What is Block Storage?

      Block storage is a type of data storage technology used to store data in blocks, each with a unique address. Each block can be accessed independently and typically has a fixed size, ranging from a few bytes to several terabytes, depending on the specific storage system.

      Block storage is commonly used in enterprise IT environments for storing data that requires high performance and low latency, such as databases and virtual machine disk images. It provides direct access to storage volumes at the block level, allowing applications to read and write data with high throughput and low latency.

      One of the key advantages of block storage is its flexibility. It can be used with a variety of operating systems and applications, and it allows storage volumes to be resized and partitioned as needed. This makes it a popular choice for cloud-based storage solutions, where customers can purchase and provision storage volumes on-demand, and only pay for the storage they actually use.

      Examples of block storage solutions include: Amazon Elastic Block Store (EBS), Google Cloud Persistent Disk, and Microsoft Azure Managed Disks.

      Block vs. File Storage

      File storage is a storage system where data is organized into files and directories. File storage systems typically use protocols such as NFS and SMB to access and manage files. File storage is commonly used for storing and sharing files such as documents, images, videos, and audio files.

      The key difference between block storage and file storage is that block storage provides direct access to storage volumes at the block level, while file storage provides access to files and directories. File storage is well suited for applications that require shared access to files and directories, such as file servers, web servers, and content management systems.

      Block storage and file storage are both important storage technologies, but they are designed for different use cases. Block storage is optimized for high performance and low latency, while file storage is optimized for shared access to files and directories.

      Block-level Tiering vs File-Level Tiering

      Block-based tiering is typically used by storage vendors. Storage tiering, aka pools solutions, use block-based tiering. Only the operating system of the NAS knows exactly what blocks were moved, so you can only access the file through the original source. If you decide to end-of-life the device, you must re-hydrate all of the archived data. Given that there will likely not be enough space on the device, this can be a painful, slow, iterative approach.

      Secondary storage vendors also starting to tier data to their device which is moving the storage concerns from Tier 1 to Tier 2 storage. You are now tied to that secondary storage vendor and lose the same flexibility on secondary storage based on need, costs and the direction of your company’s infrastructure initiatives as you had on Tier 1 storage. Ultimately, unstructured data management is not something that should be left to storage devices. You should be able to freely move from one storage device to another.

      Komprise is an unstructured data management software solution that tiers and archives data at the file-level and fully preserves file fidelity and standards-based access to your data at each tier. Data-storage agnostic, Komprise enables you to freely move data across different vendor storage and clouds without lock-in to either the storage or to Komprise. The solution is analytics driven, so you can choose what you move, when, and how.

      Read the white paper: Block-Level vs. File-Level Tiering.


      Getting Started with Komprise:

    • Block-level Tiering

      Moving blocks between the various tiers to increase performance where hot blocks and metadata are kept in the higher, faster, and more expensive data storage tiers, and cold data blocks are migrated to lower, less expensive ones. Lacking full context, these moved blocks cannot be directly accessed from their new location. Komprise uses the more advanced file-level tiering. Read the white paper “Block-Level Tiering vs. File-Level Tiering


      Getting Started with Komprise:

    • Bucket Sprawl

      Bucket sprawl refers to the problem of having a large number of data storage buckets, also known as an object storage bucket, often in cloud data storage environments, that are created and left unused or forgotten over time. This can happen when individuals or teams create buckets for specific projects or tasks, but fail to properly manage and delete them once they are no longer needed.

      What is a Cloud Bucket?

      A cloud bucket is a container for storing data objects in cloud storage services such as Amazon S3, Google Cloud Storage, or Microsoft Azure Storage. Cloud buckets can hold a variety of data types including images, videos, documents, and other files.

      Cloud buckets are typically accessed and managed through an API or web-based interface provided by the cloud storage provider. They offer a scalable and cost-effective way to store and retrieve large amounts of data, and can be used for a variety of applications including backup and disaster recovery, content delivery, and web hosting.

      Cloud buckets provide a number of benefits over traditional on-premises data storage solutions, including ease of use, cost-effectiveness, scalability, and availability. However, it is important to properly manage and secure cloud buckets to ensure that sensitive data is protected and costs are kept under control.

      The Problem with Cloud Bucket Sprawl

      Cloud bucket sprawl can lead to a number of issues, including increased data storage costs, decreased efficiency in accessing necessary data, and potential security risks if sensitive information is stored in forgotten or unsecured buckets. To avoid bucket sprawl, it is important to have a system in place for regularly reviewing and managing storage buckets, including identifying and deleting those that are no longer necessary.

      Cloud Data Management for Bucket Sprawl

      In the blog post: Making Smarter Moves in a Multicloud World, Komprise CEO and cofounder Kumar Goswami introduced Komprise cloud data management capabilities this way:

      It gives customers a better way to manage their cloud data as it grows, (combat “bucket sprawl”), gives visibility into their cloud costs, and provides a simple way to manage data both on premises and in the cloud. Komprise now provides enterprises with actionable analytics to not only understand their cloud data costs but also optimize them with data lifecycle management.

      Learn more about Komprise cloud data management.

      Infographic: How to Maximize Cloud Cost Savings


      Getting Started with Komprise:

  • C
    • Capacity Planning

      Capacity planning is the estimation of space, hardware, software, and connection infrastructure resources that will be needed a period of time. In reference to the enterprise environment, there is a common concern over whether or not there will be enough resources in place to handle an increasing number of users or interactions. The purpose of capacity planning is to have enough resources available to meet the anticipated need, at the right time, without accumulating unused resources. The goal is to match the resource of availability to the forecasted need, in the most cost-efficient manner for maximum data storage cost savings.

      True data capacity planning means being able to look into the future and estimate future IT needs and efficiently plan where data is stored and how it is managed based on the SLA of the data. Not only must you meet the future business needs of fast-growing unstructured data, you must also stay within the organization’s tight IT budgets. And, as organizations are looking to reduce operational costs with the cloud (see cloud cost optimization), deciding what data can migrate to the cloud, and how to leverage the cloud without disrupting existing file-based users and applications becomes critical.

      Data storage never shrinks, it just relentlessly gets bigger. Regardless of industry, organization size, or “software-defined” ecosystem, it is a constant stress-inducing challenge to stay ahead of the storage consumption rate. That challenge is not made any easier considering that typically organizations waste a staggering amount of data storage capacity, much of which can be attributed to improper capacity management.

      Are you making capacity planning decisions without insight?

      Komprise enables you to intelligently plan storage capacity, offset additional purchase of expensive storage, and extend the life of your existing data storage by providing visibility across your storage with key analytics on how data is growing and being used, and interactive what-if analysis on the ROI of using different data management objectives. Komprise moves data based on your objectives to secondary storage, object storage or cloud storage, of your choice while providing a file gateway for users and applications to transparently access the data exactly as before.


      With an analytics-first approach, Komprise provides visibility into how data is growing and being used across storage silos. Storage administrators and IT leaders no longer have to make storage capacity planning decisions without insight. With Komprise Intelligent Data Management, you’ll understand how much more storage will be needed, when and how to streamline purchases during planning.


      Getting Started with Komprise:

    • Carbon footprint

      Carbon footprint is the total amount of greenhouse gases (GHGs) emitted directly or indirectly by an individual, organization, product, or activity. It measures the impact of human activities on climate change by quantifying the amount of carbon dioxide (CO2) and other GHGs emitted into the atmosphere. Data centers consume significant amounts of energy for powering servers, cooling systems, networking equipment, and other infrastructure, which often leads to the generation of carbon dioxide (CO2) and other GHG emissions.

      Sustainable data management is increasingly part of an overall enterprise IT strategy to reduce the carbon footprint, with a new set of unstructured data management metrics being recommended. See File Metrics to Live By.

      Increasingly, unstructured data management solution providers are delivering dashboards across data storage silos that show metrics such as:

      Measuring the carbon footprint

      The carbon footprint is typically measured in metric tons of carbon dioxide equivalent (CO2e), which includes the warming potential of other GHGs such as methane (CH4) and nitrous oxide (N2O). These emissions arise from various sources, including energy consumption, transportation, industrial processes, agriculture, and waste management.

      Scope of Emissions

      Carbon footprints can be categorized into three scopes:

      • Scope 1: Direct emissions from sources that are owned or controlled by the entity, such as onsite fuel combustion or company-owned vehicles.
      • Scope 2: Indirect emissions from the generation of purchased electricity, heat, or steam consumed by the entity.
      • Scope 3: Indirect emissions from sources not owned or controlled by the entity but associated with its activities, such as supply chain emissions, business travel, and product use.

      Calculating the Carbon Footprint

      To determine the carbon footprint, emissions from various sources are measured or estimated and converted into CO2e using specific global warming potential factors. This data is then aggregated to provide a comprehensive assessment of the total emissions associated with the entity or activity.

      Carbon Footprint Reduction Strategies

      Once the carbon footprint is calculated, organizations and individuals can implement strategies to reduce their emissions. These may include energy efficiency improvements, transitioning to renewable energy sources, optimizing transportation systems, adopting sustainable practices in agriculture and manufacturing, and promoting waste reduction and recycling.

      Carbon offsetting involves investing in projects that help remove or reduce CO2e emissions from the atmosphere. Offsetting initiatives may include reforestation, renewable energy projects, methane capture from landfills, or investing in carbon credits. Offsetting can be used to balance or compensate for the remaining emissions that cannot be eliminated.

      According to modern science, understanding and reducing carbon footprints are crucial for mitigating climate change. In March, 2023 the United Nations warned of catastrophic global warming due to climate change.

      By measuring and managing emissions, individuals and organizations can contribute to a more sustainable future, reduce energy costs, enhance reputation, and comply with regulatory requirements. It’s important to note that calculating carbon footprints can be complex due to the diverse sources and factors involved. Precise measurements and accurate data collection are essential for obtaining reliable results. Various tools and standards are available to assist organizations in calculating and managing their carbon footprints, such as the Greenhouse Gas Protocol and ISO 14064.

      Here are 105 ways to reduce your carbon footprint.

      Getting Started with Komprise:

    • Carbon Usage Effectiveness

      Carbon Usage Effectiveness (CUE) is a metric used to evaluate the energy efficiency and environmental impact of data centers. It measures the amount of carbon emissions produced per unit of computing work performed in a data center. CUE is an extension of the Power Usage Effectiveness (PUE) metric, which measures the energy efficiency of a data center.

      The concept of CUE recognizes that not all energy sources used by data centers have the same carbon footprint. Some energy sources, such as fossil fuels, have a higher carbon intensity and contribute more to greenhouse gas emissions compared to cleaner sources like renewable energy.

      Calculating CUE

      CUE is calculated by dividing the total carbon emissions from all sources associated with a data center (including the emissions from electricity generation) by the amount of computing work performed in the data center. The computing work is typically measured in terms of the data center’s IT load or the number of computations performed.

      Carbon dioxide emission equivalents caused by data center energy use (CO2eq) ÷ IT equipment energy usage (kWh)

      A lower CUE value indicates a more energy-efficient and environmentally friendly data center, as it means less carbon emissions are produced per u

      nit of computing work. Data center operators strive to reduce their CUE by adopting energy-efficient technologies, optimizing cooling systems, implementing renewable energy sources, and improving overall operational efficiency.

      It’s worth noting that while CUE is a useful metric for evaluating the environmental impact of data centers, it is just one aspect of sustainability. Other factors such as water usage, electronic waste management, and overall lifecycle assessment should also be considered to have a comprehensive understanding of a data center’s environmental footprint.

      Komprise has written about the opportunity for sustainable data management as part of an overall sustainability and data center emission, data center optimization and data center consolidation strategy.

      Getting Started with Komprise:

    • Chargeback

      What is Chargeback?

      Chargeback is a cost allocation strategy used by enterprise IT organizations to charge business units or departments for the IT resources / services they consume. This strategy allows organizations to assign costs to the departments that are responsible for them, which can help to improve accountability, cost management and cost optimization.

      Under a chargeback model, IT resources such as hardware, software, and services are assigned a cost and allocated to the business units or departments that use them. The costs may be based on factors such as usage, capacity, or complexity. The business units or departments are then billed for the IT resources they consume based on these costs.

      The chargeback model can provide several benefits for organizations. It can help to promote transparency and accountability, as departments are charged for the IT resources they use. This can help to encourage departments to use IT resources more efficiently and reduce overall costs. Chargeback can also help to align IT spending with business goals, as departments are more likely to prioritize spending on IT resources that directly support their business objectives.

      Implementing an IT chargeback model requires careful planning and communication to ensure that it is implemented effectively. It is important to establish clear policies and guidelines for how IT resources are assigned costs and billed to business units or departments, and to provide regular reporting and analysis to help departments understand their IT costs and usage.

      Showback and Storage as a Service

      Departmental-Archiving-WP-THUMB-2-768x512Many enterprise have adopted a Storage-as-aService (STaaS) approach to centralize IT’s efforts for each department. But convincing department heads to care about storage savings is a tough task without the right tools. Storage-agnostic data management, tiering and archiving are viewed by users as an extraneous hassle and potential disruption that fails to answer “What’s in it for me?”

      This white paper explains how to make STaaS successful by telling a compelling data story department heads can’t ignore. This coupled with transparent data tiering techniques that do not change the user experience are critical to successful systematic archiving and significant savings.

      Learn how using analytics-driven showback can help secure the buy-in needed to archive more data more often. Once they understand their data—how much is cold and how much they could be saving—the conversation quickly changes.

      Read the blog post: How Storage Teams Use Deep Analytics.

      Getting Started with Komprise:

    • Checksum

      Checksum is a calculated value that’s used in NAS data analytics to determine the integrity of data. The most commonly used checksum is MD5, which Komprise uses to manage chain of custody and integrity reporting per file.

      Learn more about Komprise Elastic Data Migration for smart, fast and proven file and object data migrations.

      Learn tips on a clean cloud data migration on the Komprise blog.


      Getting Started with Komprise:

    • Cloud Cost Optimization

      Cloud cost optimization is a process to reduce operating costs in the cloud while maintaining or improving the quality of cloud services. It involves identifying and addressing areas to reduce the use of cloud resources, select more cost-effective cloud services, or deploy better management practices, including data management.

      The cloud is highly flexible and scalable, but it also involves ongoing and sometimes hidden costs, including usage fees, egress fees, storage costs, and network fees. If not managed properly, these costs can quickly become a significant burden for organizations.

      In one of our 2023 data management predictions posts, we noted:

      Managing the cost and complexity of cloud infrastructure will be Job No. 1 for enterprise IT in 2023. Cloud spending will continue, although at perhaps a more measured pace during uncertain economic times. What will be paramount is to have the best data possible on cloud assets to make sound decisions on where to move data and how to manage it for cost efficiency, performance, and analytics projects. Data insights will also be important for migration planning, spend management (FinOps), and to meet governance requirements for unstructured data management. These are the trends we’re tracking for cloud data management, which will give IT directors precise guidance to maximize data value and minimize cloud waste.

      Source: ITPro-Today

      Steps to Optimize Cloud Costs

      To optimize cloud costs, organizations can take several steps, including:

      • Right-sizing: Choose the correct size and configuration of cloud resources to meet the needs of the application, avoiding overprovisioning or underprovisioning.
      • Resource utilization: Monitor the use of cloud resources to reduce waste and improve cost efficiency.
      • Cost allocation: Implement cost allocation and tracking practices to better understand cloud costs and improve accountability.
      • Reserved instances: Use reserved instances to reduce costs by committing to a certain level of usage for a longer term.
      • Cost optimization tools: These tools identify areas for savings and help manage cloud expenses.

      The Challenge of Managing Cloud Data

      Managing cloud data costs takes significant manual effort, multiple tools, and constant monitoring. As a result, companies are using less than 20% of the cloud cost-saving options available to them. “Bucket sprawl” makes matter worse, as users easily create accounts and buckets and fill them with data—some of which is never accessed again.

      When trying to optimize cloud data, cloud administrators contend with poor visibility and complexity of data management:

      • How can you know your cloud data?
      • How fast is cloud data growing and who’s using it?
      • How much is active vs. how much is cold?
      • How can you dig deeper to optimize across object sizes and storage classes?

      How can you make managing data and costs manageable?

      • It’s hard to decipher complicated cost structures.
      • Need more information to manage data better, e.g., when was an object last accessed?
      • Factoring in multiple billable dimensions and costs is extremely complex: storage, access, retrievals, API,
        transitions, initial transfer, and minimal storage-time costs.
      • There are unexpected costs of moving data across different storage classes (e.g., Amazon S3 Standard to S3
        Glacier). If access isn’t continually monitored, and data is not moved back up when it gets hot, you will face
        expensive retrieval fees

      These issues are further compounded as enterprises move toward a multicloud approach and require a single set
      of tools, policies, and workflow to optimize and manage data residing within and across clouds.

      Komprise_Cloud_Data_ManagementKomprise Cloud Data Management

      Reduce cloud storage costs by more than 50% with Komprise.

      Cloud providers offer a range of storage services. Generally, there are storage classes with higher performance
      and costs for hot and warm data, such as Amazon S3 Standard and S3 Standard-IA, and there are storage classes
      with much lower performance and costs that are appropriate for cold data, such as S3 Glacier and S3 Glacier Deep
      Archive. Data access fees and retrieval fees for the lower cost storage classes are much higher than that of the
      higher performance and higher cost storage classes. To maximize savings, you need an automated unstructured data management solution that takes into account data access patterns to dynamically and cost optimally move data across storage classes (e.g., Amazon S3 Standard to S3 Standard-IA or S3 Standard-IA to S3 Glacier) and across multi-vendor storage services (e.g., NetApp Cloud Volumes ONTAP to Amazon S3 Standard to S3 Standard-IA to S3 Glacier to S3 Glacier Deep Archive). While some limited manual data movement through Object Lifecycle Management policies based on modified times
      or intelligent tiering is available from the cloud providers, these approaches offer limited savings and involve hidden

      Komprise automates full lifecycle management across multi-vendor cloud storage classes using intelligence from data
      usage patterns to maximize your savings without heavy lifting. Read the white paper to see how you can save +50% on cloud storage cost savings.

      Watch the video: How to save costs and manage your multi-cloud sorry

      Getting Started with Komprise:

    • Cloud Costs

      Cloud costs, or cloud computing costs, will vary based on cloud service provider, the specific cloud services and cloud resources used, usage patterns, and pricing models. See Cloud Cost Optimization.

      Gartner forecast that cloud spend will be nearly $600B in 2023 and in an increasingly hybrid enterprise IT infrastructure, cloud repatriation is making headlines: cloud repatriation and the death of cloud only.

      Why are my cloud costs so high?

      cloud_cost_optimizationA number of factors can influence your cloud costs. Examples include?

      • Compute Resources: Cloud providers offer various compute options, such as virtual machines (VMs), containers, or serverless functions. The cost of compute resources depends on factors like the instance type, CPU and memory specifications, duration of usage, and the pricing model (e.g., on-demand, reserved instances, or spot instances).
      • Cloud Storage: Cloud storage costs can vary based on the type of storage used, such as object storage, block storage, or file storage. The factors affecting storage costs include the amount of data stored, data transfer in and out of the storage, storage duration, and any additional features like data replication or redundancy. See the white paper: Block-level versus file-level tiering.
      • Networking: Cloud providers charge for network egress and data transfer between different regions, availability zones, or across cloud services. The cloud cost can depend on the volume of data transferred, the distance between data centers, and the bandwidth used.
      • Database Services: Cloud databases, such as relational databases (RDS), NoSQL databases (DynamoDB, Firestore), or managed database services, have their own pricing models. The cost can be based on factors like database size, read/write operations, storage capacity, and backup and replication requirements.
      • Data Transfer and CDN: Cloud providers typically charge for data transfer between their services and the internet, as well as for content delivery network (CDN) services that accelerate content delivery. Costs can vary based on data volume, data center locations, and regional traffic patterns.
      • Cloud Services: Cloud providers offer a range of additional cloud services, such as analytics, AI/ML, monitoring, logging, security, and management tools. The cost of these services is usually based on usage, the number of requests, data processed, or specific feature tiers.
      • Pricing Models: Cloud providers offer different pricing models, including on-demand (pay-as-you-go), reserved instances (pre-purchased capacity for longer-term usage), spot instances (bid-based pricing for unused capacity), or savings plans (commitments for discounted rates). Choosing the appropriate pricing model can impact overall cloud costs.

      To estimate and manage cloud costs effectively, enterprise IT, engineering and all consumers of cloud services need to monitor resource usage, optimize resource allocation, leverage cost management tools provided by the cloud provider and independent solution providers, and regularly review and adjust resource utilization based on actual requirements. Each cloud provider has detailed pricing documentation and cost calculators on their websites that can help estimate costs based on specific usage patterns and service selections. In an increasingly hybrid, multi-cloud environment, looking to technologies that can analyze and manage cloud costs independent from cloud service providers is gaining popularity.

      Getting Started with Komprise:

    • Cloud Data Analytics

      Cloud-Analytics-IconCloud data analytics refers to the use of cloud computing resources to process, analyze, and extract insights from large amounts of data. These solutions can include data warehousing, big data processing, machine learning, and business intelligence and can ingest a wide range of data, including structured, semi-structured, and unstructured data.

      Cloud data analytics can deliver an agile and lower-cost method to analyze large amounts of data quickly for a variety of business outcomes including operational improvements, customer behavior analysis, competitive analysis, R&D and more.

      Some of the leading cloud data analytics providers include Amazon Web Services, Google Cloud, Microsoft Azure, IBM and many early-stage venture-backed startups. One of the first cloud analytics vendors was LucidEra. These companies offer a range of cloud data analytics services and tools, including data warehousing, big data processing, machine learning, and business intelligence.

      Komprise Smart Data Workflows can be created to search and find the right unstructured data and automate the delivery of data to cloud analytics infrastructure.

      Getting Started with Komprise:

    • Cloud Data Growth Analytics

      70% of data is most enterprise organizations is cold data and has not been accessed in months, yet it sits on expensive storage and consumes the same backup resources as hot data.

      50% of the 175 zettabytes of data worldwide in 2025 will be stored in public cloud environments. (IDC)

      80% of businesses will overspend their cloud infrastructure budgets, according to due to a lack of cloud cost optimization. (Gartner)

      Komprise provides the visibility and analytics into cloud data that lets organizations understand data growth across their clouds and helps move cold data to optimize costs.


      Getting Started with Komprise:

    • Cloud Data Management


      What is Cloud Data Management?

      Cloud data management is a way to manage data across cloud platforms, either with or instead of on-premises storage. A popular form of data storage management, the goal is to curb rising cloud data storage costs, but it can be quite a complicated pursuit, which is why most businesses employ an external company offering cloud data management services with the primary goal being cloud cost optimization.

      Cloud data management is emerging as an alternative to data management using traditional on-premises software. The benefit of employing a top cloud data management company means that instead of buying on-premises data storage resources and managing them, resources are bought on-demand in the cloud. This cloud data management services model for cloud data storage allows organizations to receive dedicated data management resources on an as-needed basis. Cloud data management also involves finding the right data from on-premises storage and moving this data through data archiving, data tiering, data replication and data protection, or data migration to the cloud.

      Advantages of Cloud Data Management

      How to manage cloud storage? According to two 2023 surveys (here and here), 94% of respondents say they’re wasting money in the cloud, 69% say that data storage accounts for over one quarter of their company’s cloud costs and 94% said that cloud storage costs are rising. Optimal unstructured data management in the cloud provides four key capabilities that help with managing cloud storage and reduce your cloud data storage costs:

      1. Gain Accurate Visibility Across Cloud Accounts into Actual Usage
      2. Forecast Savings and Plan Data Management Strategies for Cloud Cost Optimization
      3. Cloud Tiering and Archiving Based on Actual Data Usage to Avoid Surprises
        • For example, using last-accessed time vs. last modified provides a more predictable decision on the objects that will be accessed in the future, which avoids costly archiving errors.
      4. Radically Simplify Cloud Migrations
        • Easily pick your source and destination
        • Run dozens or hundreds of migrations in parallel
        • Reduce the babysitting


      The many benefits of cloud data management services include speeding up technology deployment and reducing system maintenance costs; it can also provide increased flexibility to help meet changing business requirements.

      Challenges Faced with Enterprise Cloud Data Management

      But, like other cloud computing technologies, enterprise cloud data management services can introduce challenges – for example, data security concerns related to sending sensitive business data outside the corporate firewall for storage. Another challenge is the disruption to existing users and applications who may be using file-based applications on premise since the cloud is predominantly object based.

      Cloud data management service solutions should provide you with options to eliminate this disruption by transparently moving and managing data across common formats such as file and object.

      Komprise Intelligent Data Management

      Features of a Cloud Data Management Services Platform

      Some common features and capabilities cloud data management solutions should deliver:

      • Data Analytics: Can you get a view of all your cloud data, how it’s being used, and how much it’s costing you? Can you get visibility into on-premises data that you wish to migrate to the cloud? Can you understand where your costs are so you know what to do about them?
      • Planning and Forecasting: Can you set policies for how data should get moved either from one cloud storage class to another or from an on-premises storage to the cloud. Can you project your savings? Does this account for hidden fees like retrieval and egress costs?
      • Policy based data archiving, data replication, and data management: How much babysitting do you have to do to move and manage data? Do you have to tell the system every time something needs to be moved or does it have policy based intelligent automation?
      • Fast Reliable Cloud Data Migration: Does the system support migrating on-premises data to the cloud? Does it handle going over a Wide Area Network? Does it handle your permissions and access controls and preserve security of data both while it’s moving the data and in the cloud?
      • Intelligent Cloud Archiving, Intelligent Tiering and Data Lifecycle Management: Does the solution enable you to manage ongoing data lifecycle in the cloud? Does it support the different cloud storage classes (eg High-performance options like File and Cloud NAS and cost-efficient options like Amazon S3 and Glacier)?

      In practice, the design and architecture of a cloud varies among cloud providers. Service Level Agreements (SLA) represent the contract which captures the agreed upon guarantees between a service provider and its customers.

      It is important to consider that cloud administrators are responsible for factoring:

      • Multiple billable dimensions and costs: storage, access, retrievals, API, transitions, initial transfer, and minimal storage-time costs
      • Unexpected costs of moving data across different storage classes. Unless access is continually monitored and data is moved back up when it gets hot, you’ll face expensive retrieval fees.

      This complexity is the reason why only a mere 20% of organizations are leveraging the cost-saving options available to them in the cloud.

      How do Cloud Data Management Services Tools work?

      As more enterprise data runs on public cloud infrastructure, many different types of tools and approaches to cloud data management have emerged. The initial focus has been on migrating and managing structured data in the cloud. Cloud data integration, ETL (extraction, transformation and loading), and iPaaS (integration platform as a service) tools are designed to move and manage enterprise applications and databases in the cloud. These tools typically move and manage bulk or batch data or real time data.

      Cloud-based analytics and cloud data warehousing have emerged for analyzing and managing hybrid and multi-cloud structured and semi-structured data, such as Snowflake and Databricks.

      In the world of unstructured data storage and backup technologies, cloud data management has been driven by the need for cost visibility, cost reduction, cloud cost optimization and optimizing cloud data. As file-level tiering has emerged as a critical component of an intelligent data management strategy and more file data is migrating to the cloud, cloud data management is evolving from cost management to automation and orchestration, governance and compliance, performance monitoring, and security. Even so, spend management continues to be a top priority for any enterprise IT organizing migrating application and data workloads to the cloud.

      What are the challenges faced with Cloud Data Management security?

      Most of the cloud data management security concerns are related to general cloud computing security questions organizations face. It’s important to evaluate the strengths and security certifications of your cloud data management vendor as part of your overall cloud strategy

      Is adoption of Cloud Data Management services growing?

      As enterprise IT organizations are increasingly running hybrid, multi-cloud, and edge computing infrastructure, cloud data management services have emerged as a critical requirement. Look for solutions that are open, cross-platform, and ensure you always have native access to your data. Visibility across silos has become a critical need in the enterprise, but it’s equally important to ensure data does not get locked into a proprietary solution that will disrupt users, applications, and customers. The need for cloud native data access and data mobility should not be underestimated. In addition to visibility and access, cloud data management services must enable organizations to take the right action in order to move data to the right place and the right time. The right cloud data management solution will reduce storage, backup and cloud costs as well as ensure a maximum return on the potential value from all enterprise data.

      How is Enterprise Cloud Data Management Different from Consumer Systems?

      While consumers need to manage cloud storage, it is usually a matter of capacity across personal storage and devices. Enterprise cloud data management involves IT organizations working closely with departments to build strategies and plans that will ensure unstructured data growth is managed and data is accessible and available to the right people at the right time.

      Enterprise IT organizations are increasingly adopting cloud data management solutions to understand how cloud (typically multi-cloud) data is growing and manage its lifecycle efficiently across all of their cloud file and object storage options.

      Analyzing and Managing Cloud Storage with Komprise

      • Get accurate analytics across clouds with a single view across all your users’ cloud accounts and buckets and save on storage costs with an analytics-driven approach.
      • Forecast cloud cost optimization by setting different data lifecycle policies based on your own cloud costs.
      • Establish policy-based multi-cloud lifecycle management by continuously moving objects by policy across storage classes transparently (e.g., Amazon Standard, Standard-IA, Glacier, Glacier Deep Archive).
      • Accelerate cloud data migrations with fast, efficient data migrations across clouds (e.g., AWS, Azure, Google and Wasabi) and even on-premises (ECS, IBM COS, Pure FlashBlade).
      • Deliver powerful cloud-to-cloud data replication by running, monitoring, and managing hundreds of migrations faster than ever at a fraction of the cost with Elastic Data Migration.
      • Keep your users happy with no retrieval fee surprises and no disruption to users and applications from making poor data movement decisions based on when the data was created.

      A cloud data management platform like Komprise, named a Gartner Peer Insights Awards leader, that is analytics-driven, can help you save 50% or more on your cloud storage costs.


      Learn more about your options for migrating file workloads to the cloud: The Easy, Fast, No Lock-In Path to the Cloud.

      What is Cloud Data Management?

      Cloud Data Management is a way to analyze, manage, secure, monitor and move data across public clouds. It works either with, or instead of on-premises applications, databases, and data storage and typically offers a run-anywhere platform.

      Cloud Data Management Services

      Cloud data management is typically overseen by a vendor that specializes in data integration, database, data warehouse or data storage technologies. Ideally the cloud data management solution is data agnostic, meaning it is independent from the data sources and targets it is monitoring, managing and moving. Benefits of an enterprise cloud data management solution include ensuring security, large savings, backup and disaster recovery, data quality, automated updates and a strategic approach to analyzing, managing and migrating data.

      Cloud Data Management platform

      Cloud data management platforms are cloud based hubs that analyze and offer visibility and insights into an enterprises data, whether the data is structured, semi-structured or unstructured.

      Getting Started with Komprise:

    • Cloud Data Migration

      What is Cloud Data Migration?

      Cloud data migration is the process of relocating either all or a part of an enterprise’s data to a cloud infrastructure. Cloud data migration is often the most difficult and time-consuming part of an overall cloud migration project. Other elements of cloud migration involve application migration and workflow migration. A “smart data migration” to the cloud strategy for enterprise file data means an analytics-first approach ensuring you know which data can migrate, to which class and tier, and which data should stay on-premises in your hybrid cloud storage infrastructure. Komprise Elastic Data Migration makes cloud data migrations simple, fast and reliable with continuous data visibility and optimization.

      The Komprise Smart Data Migration Strategy

      Learn more about Komprise Smart Data Migration for file and object data.

      Read the blog post: Smart Data Migration for File and Object Data Workloads

      Cost, Complexity and Time:
      Why Cloud Data Migrations are Difficult

      Cloud data migrations are usually the most laborious and time-consuming part of a cloud migration initiative. Why? Data is heavy – data footprints are often in hundreds of terabytes to petabytes and can involve billions of files and objects. Some key reasons why cloud data migrations fail include:

      • Lack of Proper Planning: Often cloud data migrations are done in an ad-hoc fashion without proper analytics on the data set and planning
      • Improper Choice of Cloud Storage Destination: Most public clouds offer many different classes and tiers of storage – each with their own costs and performance metrics. Also, many of the cloud storage classes have retrieval and egress costs, so picking the right cloud storage class for a data migration involves not just finding the right performance and price to store the data but also the right access costs. Intelligent tiering and Intelligent archiving techniques that span both cloud file and object storage classes are important to ensure the right data is in the right place at the right time.
      • Ensuring Data Integrity: Data migrations involve migrating the data along with migrating metadata. For a cloud data migration to succeed, not only should all the data be moved over with full fidelity, but all the access controls, permissions, and metadata should also move over. Often, this is not just about moving data but mapping these from one storage environment to another.
      • Downtime Impact: Cloud data migrations can often take weeks to months to complete. Clearly, you don’t want users to not be able to access the data the need for this entire time. Minimizing downtime, even during a cutover, is very important to reduce productivity impact.
      • Slow Networks, Failures: Often cloud data migrations are done over a Wide Area Network (WAN), which can have other data moving on it and hence deliver intermittent performance. Plus, there may be times when the network is down or the storage at either end is unavailable. Handling all these edge conditions is extremely important – you don’t want to be halfway through a month-long cloud data migration only to encounter a network failure and have to start all over again.
      • Time Consuming – Since cloud data migrations involve moving large amounts of data, they can often involve a lot of manual effort in managing the migrations. This is laborious, tedious and time consuming.
      • Sunk Costs: Cloud data migrations are often time-bound projects – once the data is migrated, the project is complete. So, if you invest in tools to address cloud data migrations, you may have sunk costs once the cloud data migration is complete.


      Cloud Data Migrations can be of Network Attached Storage (NAS) or File Data, or of Object data or of Block data. Of these, Cloud Data Migration of File Data and Cloud Data Migration of Object data are particularly difficult and time-consuming because file and object data are much larger in volume.

      • To learn more about the seven reasons why cloud data migrations are dreaded, watch the webinar.
      • Learn more about why Komprise is the fast, no lock-in approach to unstructured cloud data migrations: Path to the cloud.

      Cloud Data Migration Strategies

      Different cloud data migration strategies are used depending on whether file data or object data need to be migrated. Common methods for moving these two types of data through cloud migration solutions are described in further detail below.

      Cloud Data Migration for File Data aka NAS Cloud Data Migrations


      File data is often stored on Network Attached Storage. File data is typically accessed over NFS and SMB protocols. File data can be particularly difficult to migrate because of its size, volume, and richness. File data often involves a mix of large and small files – data migration techniques often do better when migrating large files but fail when migrating small files. Data migration solutions need to address a mix of large and small files and handle both efficiently. File data is also voluminous – often involving billions of files. Reliable cloud data migration solutions for file data need to be able to handle such large volumes of data efficiently. File data is also very rich and has metadata, access control permissions and hierarchies. A good file data migration solution should preserve all the metadata, access controls and directory structures. Often, migrating file data involves mapping this information from one file storage format to another. Sometimes, file data may need to be migrated to an object store. In these situations, the file metadata needs to be preserved in the object store so the data can be restored as files at a later date. Techniques such as MD5 checksums are important to ensure the data integrity of file data migrations to the cloud.

      Cloud Data Migration for Object Data (S3 Data Migrations or Object-to-Cloud Data Migrations or Cloud-to-Cloud Data Migrations)

      Cloud data migrations of object data is relatively new but quickly gaining momentum as the majority of enterprises are moving to a multi-cloud architecture. The Amazon Simple Storage Service (S3) protocol has become a de-facto standard for object stores and public cloud providers. So most cloud data migrations of object data involve S3 based data migrations.

      3 common use cases for cloud object data migrations:
      • Data migrations from an on-premises object store to the public cloud: Many enterprises have adopted an on-premises object storage Most of these object storage solutions follow the S3 protocol. Customers are now looking to analyze data on their on-premises object storage and migrate some or all of that data to a public cloud storage option such as Amazon S3 or Microsoft Azure Blob.
      • Cloud-to-cloud data migrations and cloud-to-cloud data replications: Enterprises looking to switch public cloud providers need to migrate data from one cloud to another. Sometimes, it may also be cost-effective to replicate across clouds as opposed to replicating within a cloud. This also improves data resiliency and provides enterprises with a multi-cloud strategy. Cloud-to-cloud data replication differs from cloud data migration because it is ongoing – as data changes on one cloud, it is copied or replicated to the second cloud.
      • S3 data migrations: This is a generic term that refers to any object or cloud data migration done using the S3 protocol. The Amazon Simple Storage Service (s3) protocol has become a de-facto standard. Any Object-to-Cloud, Cloud-to-Cloud or Cloud-to-Object migration can typically be classified as a S3 Data Migration.


      Secure Cloud Data Migration Tools

      Cloud data migrations can be performed by using free tools that require extensive manual involvement or commercial data migration solutions. Sometimes Cloud Storage Gateways are used to move data to the cloud, but these require heavy hardware and infrastructure setup. Cloud data management solutions offer a streamlined, cost-effective, software-based approach to manage cloud data migrations without requiring expensive hardware infrastructure and without creating data lock-in. Look for elastic data migration solutions that can dynamically scale to handle data migration workloads and adjust to your demands.

      7 Tips for a Clean Cloud Data Migration:
      1. Define Sources and Targets
      2. Know the Rules & Regulations
      3. Proper Data Discovery
      4. Define Your Path
      5. Test, Test, Test
      6. Free Tools vs. Enterprise
      7. Establish a Communication Plan

      Watch the webinar: Preparing for a Cloud File Data Migration

      What is a Smart Data Migration?

      Know your cloud data migration choices for file and object data migration.


      Getting Started with Komprise:

    • Cloud Data Storage

      Cloud data storage is a service for individuals or organizations to store data through a cloud computing provider such as AWS, Azure, Google Cloud, IBM or Wasabi. Storing data in a cloud service eliminates the need to purchase and maintain data storage infrastructure, since infrastructure resides within the data centers of the cloud IaaS provider and is owned/managed by the provider. Many organizations are increasing data storage investments in the cloud for a variety of purposes including: backup, data replication and data protection, data tiering and archiving, data lakes for artificial intelligence (AI) and business intelligence (BI) projects, and to reduce their physical data center footprint. As with on-premises storage, you have different levels of data storage available in the cloud. You can segment data based on access tiers: for instance, hot and cold data storage.


      Types of Cloud Data Storage

      Cloud data storage can either be designed for personal data and collaboration or for enterprise data storage in the cloud. Examples of personal data cloud storage are Google Drive, Box and DropBox.

      Increasingly, corporate data storage in the cloud is gaining prominence – particularly around taking enterprise file data that was traditionally stored on Network Attached Storage (NAS) and moving that to the cloud.

      Cloud file storage and object storage are gaining adoption as they can store petabytes of unstructured data for enterprises cost-effectively.

      Enterprise Cloud Data Storage for Unstructured Data

      (Cloud File Data Storage and Cloud Object Data Storage)

      Enterprise unstructured data growth is exploding – whether its genomics data, video and media content, or log files or IoT data.  Unstructured data can be stored as files on file data storage or as objects on cost-efficient object storage. Cloud storage providers are now offering a variety of file and object storage classes at different price points to accommodate unstructured data. Amazon EFS, FSX, Azure Files are examples of cloud data storage for enterprise file data, and Amazon S3, Azure Blob and Amazon Glacier are examples of object storage.

      Advantages of Cloud Data Storage

      There are many benefits of investing in cloud data storage, particularly for unstructured data in the enterprise. Organizations gain access to unlimited resources, so they can scale data volumes as needed and decommission instances at the end of a project or when data is deleted or moved to another storage resource. Enterprise IT teams can also reduce dependence on hardware and have a more predictable storage budget. However, without proper cloud data management, cloud egress costs and other cloud costs are often cited as challenges.

      In summary, cloud data storage allows:
      • The opportunity to reduce capital expenses (CAPEX) of data center hardware along with savings in energy, facility space and staff hours spend maintaining and installing hardware.
      • Deliver vastly improved agility and scalability to support rapidly changing business needs and initiatives.
      • Develop an enterprise-wide data lake strategy that would otherwise be unaffordable.
      • Lower risks from storing important data on aging physical hardware.
      • Leverage cheaper cloud storage for archiving and tiering purposes, which can also reduce backup costs.
      Challenges and Considerations
      • Cloud data storage can be costly if you need to frequently access the data for use outside of the cloud, due to egress fees charged by cloud storage providers.
      • Using cloud tiering methodologies from on-premises storage vendors may result in unexpected costs, due to the need for restoring data back to the storage appliance prior to use. Read the white paper Cloud Tiering: Storage-Based vs. Gateways vs. File-Based
      • Moving data between clouds is often difficult, because of data translation and data mobility issues with file objects. Each cloud provider uses different standards and formats for data storage.
      • Security can be a concern, especially in some highly regulated sectors such as healthcare, financial services and e-commerce. IT organizations will need to fully understand the risks and methods of storing and protecting data in the cloud.
      • The cloud creates another data silo for enterprise IT. When adding cloud storage to an organization’s storage ecosystem, IT will need to determine how to attain a central, holistic view of all storage and data assets.

      For these reasons, cloud optimization and cloud data management are essential components of an enterprise cloud data storage and overall data storage cost savings strategy. Komprise has strategic alliance partnerships with hybrid and cloud data storage technology leaders:

      Learn more about your options for migrating file workloads to the cloud: The Easy, Fast, No Lock-In Path to the Cloud.

      Getting Started with Komprise:

    • Cloud File Storage

      What is Cloud File Storage?

      Cloud File Storage, also known as Cloud NAS isCloud-Migration-3@3x-400x400 a method for storing data in the cloud that provides servers and applications access to data through file system protocols such as NFS and SMB. Cloud file storage allows customers to move file-based workloads to the cloud without code changes.

      Popular choices for cloud file storage are AWS FSx for Windows, AWS FSx ONTAP, AWS FSx ZFS, Microsoft Azure Files, Google Filestore, and Qumulo.

      In late 2021, Komprise COO Krishna Subramanian predicted that cloud file storage will accelerate.

      She wrote:

      First, it was cloud-native applications, then block workloads, but now it’s time for file workloads to move to the cloud. Explosive growth in unstructured file data has led to data centers bursting at the seams. Covid-19 has accelerated the shift to cloud for file workloads.

      Data management solutions are also enabling smart file migrations so that hot data is placed in cloud file storage and cold data is transparently and efficiently tiered at the file level to object storage. This means that customers can use data from both the file and object tiers. Another approach many vendors are taking is to provide cloud-like economics and pricing while the infrastructure remains on-premises — HPE Greenlake and Pure as a Service are examples of this trend.


      Getting Started with Komprise:

    • Cloud Migration


      Cloud migration refers to the movement of data, processes, and applications from on-premises data storage or legacy infrastructure to cloud-based infrastructure for storage, application processing, data archiving and ongoing data lifecycle management. Komprise offers an analytics-driven cloud migration software solution – Elastic Data Migration – that integrate with most leading cloud service providers, such as AWS, Microsoft Azure, Google Cloud, Wasabi, IBM Cloud and more.

      Benefits of Cloud Migration

      Migrating to the cloud can offer many advantages – lower operational costs, greater elasticity, and flexibility. Migrating data to the cloud in a native format also ensures you can leverage the computational capabilities of the cloud and not just use it as a cheap storage tier. When migrating to the cloud, you need to consider both the application as well as its data. While application footprints are generally small and relatively easier to migrate, cloud file data migrations need careful planning and execution as data footprints can be large. Cloud migration of file data workloads with Komprise allows you to:

      • Plan a data migration strategy using analytics before migration. A pre-migration analysis helps you identify which files need to be migrated, plan how to organize the data to maximize the efficiency of the migration process. It’s important to know how data is used and to determine how large and how old files are throughout the storage system. Since data footprints often reach billions of files, planning a migration is critical.
      • Improve scalability with Elastic Data Migration. Data migrations can be time consuming as they involve moving hundreds of terabytes to  petabytes of data.  Since storage that data is migrating from is usually still in use during the migration, the data migration solution needs to move data as fast as possible without slowing down user access to the source storage.  This requires a scalable architecture that can leverage the inherent parallelism of the data sets to migrate multiple data streams in parallel without overburdening any single source storage. Komprise uses a patented elastic data migration architecture that maximizes parallelism while throttling back as needed to preserve source data storage performance.
      • Shrink cloud migration time. When compared to generic tools used across heterogeneous cloud and physical storage, Komprise cloud data migration is nearly 30x faster. Performance is maximized at every level with the auto parallelize feature, minimizing network usage and making migration over WAN more efficient.


      • Reduce ongoing cloud data storage costs with smart migration, intelligent tiering and data lifecycle management in the cloud. Migrating to the cloud can reduce the amount spent on IT needs, storage maintenance, and hardware upgrades as these are typically handled by the cloud provider. Most clouds provide multiple storage classes at different price points – Komprise intelligently moves data to the right storage class in the cloud based on your policy and performs ongoing data lifecycle management in the cloud to reduce storage cost.  For example, for AWS, unlike cloud intelligent tiering classes, Komprise tiers across both S3 and Glacier storage classes so you get the best cost savings.
      • Simplify storage management. With a Komprise cloud migration, you can use a single solution across your multivendor storage and multicloud architectures. All you have to do is connect via open standards – pick the SMB, NFS, and S3 sources along with the appropriate destinations and Komprise handles the rest. You also get a dashboard to monitor and manage all of your migrations from one place. No more sunk costs of point migration tools because Komprise provides ongoing data lifecycle management beyond the data migration.
      • Greater resource availability. Moving your data to the cloud allows it to be accessed from wherever users may be, making your it easier for international businesses to store and access their data from around the world. Komprise delivers native data access so you can directly access objects and files in the cloud without getting locked in to your NAS vendor—or even to Komprise.

      Cloud Migration Process

      The cloud data migration process can differ widely based on a company’s storage needs, business model, environment of current storage, and goals for the new cloud-based system. Below are the main steps involved in migrating to the cloud.

      Step 1 – Analyze Current Storage Environment and Create Migration Strategy

      A smooth migration to the cloud requires proper planning to ensure that all bases are covered before the migration begins. It’s important to understand why the move is beneficial and how to get the most out of the new cloud-based features before the process continues.

      Step 2 – Choose Your Cloud Deployment Environment

      After taking a thorough look at the current resource requirements across your storage system, you can choose who will be your cloud storage provider(s). At this stage, it’s decided which type of hardware the system will use, whether it’s used in a single or multi-cloud solution, and if the cloud solution will be public or private.

      Step 3 – Migrate Data and Applications to the Cloud

      Application workload migration to the cloud can be done through generic tools.  However, since data migration involves moving petabytes of data and billions of files, you need a data management software solution that can migrate data efficiently in a number of ways including through a public internet connection, a private internet connection, (LAN or a WAN), etc.

      Step 4 – Validate Data After Migration

      Once the migration is complete, the data within the cloud can be validated and production access to the storage system can be swapped from on-premises to the cloud.  Data validation often requires MD5 checksum on every file to ensure the integrity of the data is intact after migration.

      Komprise Cloud Data Migration

      With Elastic Data Migration from Komprise, you can affordably run and manage hundreds of migrations across many different platforms simultaneously. Gain access to a full suite of high-speed cloud migration tools from a single dashboard that takes on the heavy lifting of migrations, and moves your data nearly 30x faster than traditional available services—all without any access disruption to users or apps.

      Our team of cloud migration professionals with over two decades of experience developing efficient IT solutions have helped businesses around the world provide faster and smoother data migrations with total confidence and none of the headaches. Contact us to learn more about our cloud data migration solution or sign up for a free trial to see the benefits beyond data migration with our analytics-driven Intelligent Data Management solution.

      Learn more about your options for migrating file workloads to the cloud: The Easy, Fast, No Lock-In Path to the Cloud.


      Getting Started with Komprise:

    • Cloud NAS


      What is Cloud NAS?

      Cloud NAS is a relatively new term – it refers to a cloud-based storage solution to store and manage files. Cloud NAS or cloud file storage is gaining prominence and several vendors have now released cloud NAS offerings.

      What is NAS?

      Network Attached Storage (NAS) refers to data storage that can be accessed from different devices over a network. NAS environments have gained prominence for file-based workloads because they provide a hierarchical structure of directories and folders that makes it easier to organize and find files. Many enterprise applications today are file-based, and use files stored in a NAS as their data repositories.

      Access Protocols

      Cloud NAS storage is accessed via the Server Message Block (SMB) and Network File System (NFS) protocols. On-premises NAS environments are also accessed via SMB and NFS.

      Why is Cloud NAS gaining in importance?

      While the cloud was initially used by DevOps teams for new cloud-native applications that were largely object-based, the cloud is now seen as a major destination for core enterprise applications. These enterprise workloads are largely file-based, and so moving them to the cloud without rewriting the application means file-based workloads need to be able to run in the cloud.

      To address this need, both cloud vendors and third-party storage providers are now creating cloud-based NAS offerings. Here are some examples of cloud NAS offerings:

      Cloud NAS Tiers

      Cloud NAS storage is often designed for high-performance file workloads and its high performance Flash tier can be very expensive.

      Many Cloud NAS offerings such as AWS EFS and NetApp CloudVolumes ONTAP do offer some less expensive file tiers – but putting data in these lower tiers requires some data management solution. As an example, the standard tier of AWS EFS is 10 times more expensive than the standard tier of AWS S3. Furthermore, when you use a Cloud NAS, you may also have to replicate and backup the data, which can often make it three times more expensive. As this data becomes inactive and cold data, it is very important to manage data lifecycle on Cloud NAS to ensure you are only paying for what you use and not for dormant cold data on expensive tiers.

      Intelligent Data Archiving and Intelligent Data Tiering for Cloud NAS

      An analytics-driven unstructured data management solution can help you get the right data onto your cloud NAS and keep your cloud NAS costs low by managing the data lifecycle with intelligent archiving and intelligent tiering.

      As an example, Komprise Intelligent Data Management for multi-cloud does the following:

      • Analyzes your on-premises NAS data so you can pick the data sets you want to migrate to the cloud
      • Migrates on-premises NAS data to your cloud NAS with speed, reliability and efficiency
      • Analyzes data on your cloud NAS to show you how data is getting cold and inactive
      • Enables policy-based automation so you can decide when data should be archived and tiered from expensive Cloud NAS tiers to lower cost file or object classes
      • Monitors ongoing costs to ensure you avoid expensive retrieval fees when cold data becomes hot again
      • Eliminates expensive backup and DR costs of cold data on cloud NAS

      Cloud NAS Migration


      There are man potential advantages to migrated your NAS device to the cloud. But the right approach to cloud data migration is essential. Some of the common cloud NAS migration challenges are outlined in this post: Eliminating the Roadblocks of Cloud Data Migrations for File and NAS Data. Avoid unstructured data migration challenges and pitfalls with an analytics-first approach to cloud data migration and unstructured data management. With Komprise Elastic Data Migration you will:

      • Know before you migrate – analytics drive the most cost-effective plans
      • Preserve data integrity – maintain metadata, run MD5 checksums
      • Save time and costs – multi-level parallelism provides elastic scaling
      • Be worry-free – built for petabyte-scale that ensures reliability
      • Migrate NFS 27X faster and Migrate SMB data 25X faster – forget slow, free tools that need babysitting

      Get the fast, no lock-in path to the cloud with a unified platform for unstructured data migration.



      Getting Started with Komprise:

    • Cloud Object Storage

      What is Cloud Object Storage?

      Cloud-storage-problem-blog-callout@3x-1536x1056Cloud object storage is a type of cloud data storage that is designed to store and manage large amounts of unstructured data in the cloud. Unlike file-based storage systems, cloud object storage services are based on a simple key-value model that allows data to be stored and retrieved based on unique identifiers (or keys) that are associated with each piece of data.

      Also see Object Storage.

      Cloud object storage is ideal for storing documents, images, videos, and other unstructured data types that doesn’t fit neatly into a structured (relational) database. Cloud object storage systems are designed to be highly scalable and can store large data sets, making them well-suited for big data applications and use cases such as backup and archiving, content distribution, and data analytics.

      Examples of Cloud Object Storage

      Some examples of cloud object storage include Amazon S3, Microsoft Azure Blob Storage, Google Cloud Storage, and IBM Cloud Object Storage Services. These cloud object storage services offer a range of features such as data durability and availability, built-in encryption, and flexible data access controls, as well as APIs and integrations for developers to easily incorporate object storage into their applications.

      Komprise TMT: Cloud File and Object Duality

      Komprise-Kumar-TMT-Deep-Dive-Blog-Part2-Social-768x402One of the core components of the Komprise Intelligent Data Management Platform is the patented Transparent Move Technology. When Komprise tiers files to a new target, typically object storage like AWS S3 or Azure Blob, moved files remain in native form, which means when a file becomes an object, a user sees it as a file. In addition to no end user disruption, preserving duality of file and object data across silos enables native cloud services on the data and ensures your data is not locked into a proprietary storage vendor format. This approach also ensures that hot data at the original source is handled by that storage vendor for optimal performance.

      In an interview, CEO and co-founder Kumar Goswami put it this way:

      Without using any agents, you can tier the data to the cloud and still access it from the original source as if it had never moved AND access it as a native object in the cloud to leverage cloud services like AI/ML cloud applications. This file to object duality, without agents, without getting in front of hot, mission-critical data is something no one else can tout.

      Komprise partners with cloud object storage vendors to deliver data-storage agnostic unstructured data management as a service.

      Getting Started with Komprise:

    • Cloud Storage Gateway

      A cloud storage gateway is a hardware or software appliance that serves as a bridge between local applications and remote cloud-based storage.

      A cloud storage gateway provides basic protocol translation and simple connectivity to allow incompatible technologies to communicate. The gateway may be hardware or a virtual machine (VM) image.

      The requirement for a gateway between cloud storage and enterprise applications became necessary because of the incompatibility between protocols used for public cloud technologies and legacy storage systems. Most public cloud providers rely on Internet protocols, usually a RESTful API over HTTP, rather than conventional storage area network (SAN) or network-attached storage (NAS) protocols.

      Gateways can also be used for archiving in the cloud. This pairs with automated storage tiering, in which data can be replicated between fast, local disk and cheaper cloud storage to balance space, cost, and data archiving requirements.

      The challenge with traditional cloud gateways which front the cloud with on-premise hardware and use the cloud like another storage silo is that the cloud is very expensive for hot data that tends to be frequently accessed, resulting in high retrieval costs. Read the blog post: Are Cloud Storage Gateways a Good Choice for Cloud Data Migrations?


      Cloud Storage Gateway versus File-Level Cloud Tiering

      Cloud storage gateways create a new appliance (virtual or physical) that acts as your storage at each site to cache data locally and put a golden copy in the cloud. They are useful when you are doing active file collaboration across multiple sites and do not have NAS at branch sites or do not want to use your existing NAS. But, they do not leverage existing data storage investments and require data to be moved to the gateway which creates additional infrastructure costs. Cloud storage gateways store data in the cloud in their proprietary format. Similar to storage-based cloud tiering, cloud storage gateways create proprietary lock-in and unnecessary cloud gateway costs in perpetuity. And they also typically create additional on-premises costs.

      Cloud Storage Gateways: Additional On-Premises Infrastructure

      Cloud storage gateways are typically hardware-based since they have to serve hot data from the cache. Many vendors also offer virtual appliance options for smaller deployments.

      Duplication of Data in the Cloud

      Cloud storage gateways typically put all the data in the cloud and then cache some data locally. So, if you are using a cloud storage gateway for 100TB, then all 100TB of data is in the cloud and a subset of it (maybe 20TB or 30TB) is also cached locally. This means you may need 130TB of infrastructure to house 100TB of data. Depending on the size of the local cache, this may be larger.

      Cloud Storage Gateways: A New Storage Silo

      A cloud storage gateway is a new storage infrastructure silo that caches some data locally and keeps all of the data in the cloud. It replaces your existing NAS. It does not work with it. It is a rip-and-replace approach.

      Cloud Storage Gateway Licensing Charges to Access Data in the Cloud

      Cloud storage gateways lock data in the cloud with their proprietary format. This means you cannot directly access your data in the cloud—data access needs to be through the gateway software in the cloud. Many customers are surprised to learn they have to pay gateway licensing costs even to access data in the cloud, and this cost continues as long as you need your data. This lock-in limits flexibility and creates unnecessary cloud expenses. It also limits your use of the cloud as you cannot natively access your data without the gateway software.

      Assuming $700/TB/yr. of cloud storage gateway licensing costs, cloud storage gateways have 287% higher annual costs than using a file-level data management solution with the cloud. This is a recurring cost that you pay for over the lifetime of your data!

      This table summarizes the common cloud data migration requirements and the differences between Komprise Elastic Data Migration and Cloud Storage Gateways.

      Getting Started with Komprise:

    • Cloud Tiering

      What is Cloud Tiering?

      Cloud tiering definition: Cloud tiering is increasingly becoming a critical capability in managing enterprise file workloads across the hybrid cloud. Cloud tiering (also referred to as cloud archiving or archive to the cloud) are techniques that offload less frequently used data, also known as cold data, from expensive on-premises file storage or Network Attached Storage (NAS) to cheaper levels of storage in the cloud, typically object storage classes such as Amazon S3. Cloud tiering is a variant of data tiering. The term “data tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems.

      Cloud Tiering Transparently Extends Enterprise File Storage to the Cloud

      Enterprises today are increasingly trying to move core file workloads to the cloud. Since file data can be voluminous, involving billions of files, migrating file data to the cloud can take months and create disruption.

      A simple solution to this is to gradually offload files to the cloud (cloud tiering) without changing the end user experience. Cloud tiering (or archiving to specific cloud tiers) enables this by moving infrequently used cold data to a cheaper cloud storage tier, while the data continues to remain accessible from the original location. This enables users to transparently extend on-premises capacity with the cloud.

      Cloud Tiering Can Yield Significant Savings If Done Correctly

      Cloud object storage is cost-efficient if used correctly. Most cloud providers charge not only for the storage, but also to retrieve data, and they charge egress fees if the data has to leave the cloud. Cloud retrieval fees are usually in the form of charges for “get” and “put” API calls and cloud egress costs are charged by the amount of data that is read from anywhere outside the cloud. So, to keep enterprise storage costs low, infrequently accessed data such as snapshots, logs, backups and cold data are best suited for tiering to the cloud.

      By tiering cold data to the cloud, the on-premises storage array needs to only keep hot data and the most recent logs and snapshots. Across Komprise customers, we have found that typically 60% to 80% of their actual data has not been accessed in over a year. By cloud tiering the cold data as well as older log files and snapshots, the capacity of the storage array, mirrored storage array (if mirroring/replication is being used) and backup storage is reduced dramatically. This is why tiering cold data can reduce the overall storage cost by as much as 70% to 80%.

      Cloud-Data-Tieringv2-1-300x225The many advantages of cloud tiering of cold data include:

      • Reduced storage acquisition costs. Flash storage, used for fast access to hot data, is expensive. By tiering off infrequently used data you can purchase a much smaller amount of flash storage, thereby reducing acquisition costs.
      • Cut backup footprint and costs. By continuously tiering off cold data that is not being accessed you can reduce your backup footprint, backup license costs, and backup storage costs if the cold data is placed in robust storage (such as that provided by the major CSPs).
      • Increase disaster recovery speeds and lower disaster recovery (DR) costs. As with backup, by tiering off the cold data, the amount of data mirrored/replicated is dramatically reduced as well.
      • Improved storage performance. By running storage at a lower capacity and by removing access to cold data to another storage device or service, you can increase the performance of your storage array.
      • Leverage the cloud to run AI, ML, compliance checks and other applications on cold data. With cold data in the cloud, you can access, search and process your cold data without putting any load on your storage array. The cold data that is tiered off has value. Being able to process and feed your cold data into your AI/ML/BI engines is critical to staying competitive. By tiering you can extract value from your cold data without burdening your storage array. This also helps to extend the life of your storage array.

      Clearly, if cloud tiering is implemented correctly at the file level it will provide all of the above benefits whereas block tiering to the cloud will not. But not all cloud tiering choices are the same.

      To learn more about the differences between cloud tiering at the file level vs the block level, and why so-called cloud pools such as NetApp FabricPool or Dell EMC Isilon CloudPools are not the right approach for cloud tiering, read “What you need to know before jumping into the cloud tiering pool”.

      Also download the white paper: Cloud Tiering: Storage-Based vs Gateways vs. File-Based.


      Getting Started with Komprise:

    • CloudPools

      What are CloudPools?

      Dell EMC Isilon CloudPools software provides policy-based automated tiering that allows for an additional storage tier for the Isilon cluster at your data center. CloudPools supports tiering data from Isilon to public, private or hybrid cloud options. This technology is a form of storage pools, which are collections of storage volumes exported to a shared storage environment.

      Read more about storage pools.

      Smart, fast proven Isilon migration.

      Read the blog post: What you need to know before jumping into the cloud tiering pool


      Download the white paper: Cloud Tiering: Storage-Based vs Gateways vs File-Based: Which is Better and Why?

      Getting Started with Komprise:

    • Cold Data Storage


      What is cold data?

      Cold data refers to data that is infrequently accessed, as compared to hot data that is frequently accessed. As unstructured data grows at unprecedented rates, organizations are realizing the advantages of utilizing cold data storage devices instead of high-performance primary storage as they are much more economical, simple to set up & use, and are less prone to suffering from drive failure.

      For many organizations, the real difficulty with cold data is figuring out when data should be considered hot and kept on primary storage or it can be labeled as cold and moved off to a secondary storage device. For this reason, it’s important to understand the difference between data types to develop a solution for managing cold data that is most cost effective for your organization.

      Types of Data That Cold Storage is Typically Used For

      Examples of data types for which cold storage may be suitable include information a business is required to keep for regulatory compliance, video, photographs, and data that is saved for backup, archival, big-data analytics or disaster recovery purposes. As this data ages and is less frequently accessed, it can generally be moved to cold storage. A policy-based data management approach allows organizations to optimize storage resources and reduce data storage costs by moving inactive data to more economical cold data storage.

      Advantages of Developing a Cold Data Storage Solution

      1. Prevent primary storage solutions from becoming overburdened with unused data
      2. Reduce overall resource costs of data storage
      3. Simplify data storage solution and optimize the management of its data
      4. Efficiently meet governance and compliance requirements
      5. Make use of more affordable & reliable mechanical storage drives for lesser used data

      Reduce Strain on Primary Storage by Moving Cold Data to Secondary Storage

      Affordable Costs of Cold Storage

      When comparing costs for enterprise-level storage drives, the mechanical drives used in many cold data storage systems are just over 20% of the price that high-end solid-state drives (SSD) can cost on average. For SSD’s at the top tier of performance, storage still costs close to 10 centers per gigabyte whereas NAS-level mechanical drives cost only around 2 centers per gigabyte on average.

      Simplify Your Data Storage Solution

      A well-optimized cold data storage system can make your local storage infrastructure much less cluttered & easier to maintain. As the storage tools which help us automatically determine which data is hot and cold continue to improve, managing the movement of data between solutions or tiers is becoming easier every year. Some cold data storage solutions are even starting to automate the entirety of the unstructured data management process based on rules that the business establishes.

      Meet Regulatory or Compliance Requirements

      Many organizations in the healthcare industry are required to hold onto their data for extended periods of time, if not forever. With the possibility of facing litigation somewhere down the line based on having this data intact, corporations are opting to use a cold data storage solution which can effectively store critically important, unused data under conditions in which it cannot be tampered with or altered.

      Increase Data Durability with Cold Data Storage

      Reliability is one of the most important factors when choosing a data storage solution to house data for extended periods of time or indefinitely. Mechanical drives can be somewhat slower than SSD’s in providing file access, but they are still quick to be able to pull files and offer much more budget room for creating additional backup or parity within your storage system.

      When considering storage hardware for cold data solutions, consider low cost, high-capacity options with a high degree of data durability so your data can remain intact for as long as it needs to be stored for.

      Learn more about the your options when it comes to migrating file workloads to the cloud.

      How Pfizer Saved Millions with a Cold Data Management Strategy

      Pfizer needed to change the way it was managing petabytes of unstructured data to cut data storage costs and reinvest in areas with patients at the center. Read the blog.


      Getting Started with Komprise:

    • Compression

      Compression is the process of reducing the size of a file or data set to occupy less storage space or transmit more efficiently. It involves encoding data in a more compact representation, which can be restored to its original form when needed. Compression techniques are widely used in data storage, data transmission, and multimedia applications.

      All about compression:

      • Lossless Compression: Lossless compression algorithms reduce the file size without losing any data. The compressed file can be fully restored to its original form. This is commonly used for text files, databases, and other data where data integrity is crucial.
      • Lossy Compression: Lossy compression algorithms achieve higher compression ratios by selectively discarding some data that is considered less perceptually important. This results in some loss of information, which may not be noticeable in certain types of data, such as images, audio, or video. Lossy compression is often used in multimedia applications to reduce file sizes while maintaining acceptable quality.
      • Compression Algorithms: Various compression algorithms and techniques are employed, each with its own advantages and limitations. Some well-known compression algorithms include ZIP, GZIP, Lempel-Ziv-Welch (LZW), Huffman coding, and MPEG for video compression.
      • Application-Specific Compression: Different types of data may benefit from specialized compression techniques tailored to their characteristics. For example, images can be compressed using techniques like JPEG, while audio can use formats like MP3 or AAC. Each format optimizes the compression based on the unique properties of the data.
      • Compression Ratio: The compression ratio represents the reduction in file size achieved by the compression process. It is calculated by dividing the original file size by the compressed file size. Higher compression ratios indicate more efficient compression techniques.
      • Decompression: Decompression is the reverse process of compression, where the compressed file is restored to its original form. Decompression algorithms reconstruct the compressed data based on the compression method used.

      Compression Performance Considerations

      Compression and decompression processes require computational resources, including processing power and memory. The performance impact depends on the complexity of the compression algorithm and the size of the data being compressed or decompressed.

      Compression is widely used to optimize storage space, reduce data transfer times, and improve bandwidth utilization. It enables efficient data storage, faster data transmission over networks, and better utilization of resources in various applications, ranging from file compression on personal computers to multimedia streaming and archival data compression.

      Getting Started with Komprise:

  • D
    • Dark Data

      What is Dark Data?

      Dark data is the term used to describe the vast amount of data (primarily unstructured data) that organizations collect, generate, and store but do not actively use, analyze, or leverage for decision-making, business intelligence, analytics, AI or other purposes. This data typically remains untapped or unexplored due to various reasons, such as lack of awareness, inadequate data management processes, or technical challenges.

      Gartner defines Dark Data as:

      The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing). Similar to dark matter in physics, dark data often comprises most organizations’ universe of information assets. Thus, organizations often retain dark data for compliance purposes only. Storing and securing data typically incurs more expense (and sometimes greater risk) than value.

      In the article: 5 Steps for Minimizing Dark Data Risk, the first step to protecting dark data is visibility. (See Komprise Analysis.)

      komprise_stats_3Examples of Dark Data

      • Unstructured data: This includes text documents, images, videos, audio files, and other forms of data that are not organized in a structured format like databases.
      • Log files: Many systems generate log files to record events, errors, and other activities, but these logs may not be regularly reviewed or analyzed.
      • Historical data: Older datasets that were collected for specific projects or purposes might no longer be actively used or considered valuable.
      • Redundant or duplicated data: Copies of data that were created for backup or replication purposes but are not actively used. (Sometimes known as Redundant, Outdated, Trivial or ROT data.)
      • Siloed data: Data that is isolated in different departments or systems, making it challenging to access and integrate with other data sources.
      • IoT-generated data: With the proliferation of Internet of Things (IoT) devices, there’s an increasing amount of data being generated, but not all of it is fully utilized.

      Dark Data Challenges

      Some of the known challenges for the accumulation of so-called Dark Data include:

      • Data storage costs: Storing large amounts of unused data can be costly, both in terms of hardware and cloud storage expenses.
      • Security and privacy risks: Dark data may contain sensitive information that isn’t adequately protected, increasing the risk of data breaches.
      • Missed insights: Valuable insights and opportunities for improvement may be hidden within the dark data, preventing organizations from making data-driven decisions.
      • Compliance and legal challenges: Regulatory requirements may demand proper data management and disposal practices, which dark data may violate.

      To address dark data challenges, organizations need to implement better data governance practices, invest in data management tools and infrastructure, particularly unstructured data management, and establish processes to identify, classify, and leverage relevant data both efficiently and effectively. By doing so, they can ensure strong data protection is established while unlocking the potential hidden within their dark data and turn it into valuable insights for better decision-making, strategic planning and the growing opportunity presented by artificial intelligence in the enterprise.

      Getting Started with Komprise:

    • Data Analytics

      Data analytics refers to the process used to enhance productivity and business improvement by extracting and categorizing data to identify and analyze behavioral patterns. Techniques vary according to organizational requirements.

      The primary goal of data analytics is to help organizations make more informed business decisions by enabling analytics professionals to evaluate large volumes of transactional and other forms of data. Data analytics can be pulled from anything from Web server logs to social media comments.

      Potential issues with data analytics initiatives include a lack of analytics professionals and the cost of hiring qualified candidates. The amount of information that can be involved and the variety of data analytics data can also cause data analytics issues, including the quality and consistency of the data. In addition, integrating technologies and data warehouses can be a challenge, although various vendors offer data integration tools with big data capabilities.

      Big data has drastically changed the requirements for extracting data analytics from business data. With relational databases, administrators can easily generate reports for business use, but they lack the broader intelligence data warehouses can provide. However, the challenge for data analytics from data warehouses is the costs associated.

      Unstructured Data Analytics

      There is also the challenge of pulling the relevant data sets to enable data analytics from cold data. This requires intelligent data management solutions that track what unstructured data is kept and where, and enable you to easily search and find relevant data sets for big-data analytics.

      Deliver the right data to the right place at right time with Komprise and bring unstructured data to you your analytics projects.

      Learn more Komprise unstructured data analysis and insight.

      Getting Started with Komprise:

    • Data Archiving

      What is Data Archiving?

      Data Archiving, often referred to as Data Tiering, protects older data that is not needed for everyday operations of an organization. A data archiving strategy reduces primary storage and allows an organization to maintain data that may be required for regulatory or other needs.

      Benefits of a Data Archiving Solution

      Data archiving protects older information that is not needed for everyday operations but which users may  occasionally access. Data archiving tools deliver the most value by reducing primary storage costs, rather than acting as a data recovery tool. Unstructured data archive tools are in high demand because they can drastically reduce overall storage costs;  most data is unstructured and resides on expensive, high-performance storage devices. Archive data storage, meanwhile, is typically on a low-performance, lost-cost, high-capacity data storage medium.

      Types of Data Archiving

      Some data archiving products only allow read-only access to protect data from modification, while other data tiering and archiving products allow users to make changes.

      Data archiving take a few different forms:

      • Options include online data storage, which places archive data onto disk systems where it is readily accessible. Archives are frequently file-based, but object storage is also growing in popularity. A key challenge when using object storage to archive file-based data is the impact it can have on users and applications. To avoid changing paradigms from file to object and breaking user and application access, use data management solutions that provide a file interface to data that is archived as objects.
      • Another archival system uses offline data storage where data archiving software writes the data to tape or other removable media. using. Tape consumes less power than disk systems, translating to lower costs.
      • A third option is using cloud data storage, offered by Amazon, Azure and other cloud providers. Cloud object storage is a smart choice for cloud tiering and data archiving because of its low-cost, immutable nature. This is inexpensive but requires ongoing investment.

      New requirements for secure data archiving have resulted from more sophisticated cybersecurity and ransomware threats. Encryption of sensitive archives and multi-factor authentication for access and object lock storage (such as AWS S3) are a few ways to protect archival data from modification, corruption and theft.

      The data archiving process typically uses automated software, which will automatically move cold data via policies set by an administrator. A popular approach is to make the archive “transparent”  so that users and applications can access archived data from the same location as if it had never moved. (See Native Access)

      Learn more about Komprise Transparent Move Technology (TMT).

      Getting Started with Komprise:

    • Data Backup

      Why Data Backup?

      Data loss can occur from a variety of causes, including computer viruses, hardware failure, file corruption, fire, flood, or theft, etc. Data loss may involve critical financial, customer, and company data, so a solid data backup plan is critical for every organization.

      Data backup plan considerations:
      • What data (files and folders) to backup
      • How often to run your backups
      • Where to store the backup data
      • What compression method to use
      • What type of backups to run
      • What kind of media on which to store the backups

      In general, you should back up any data that can’t be replaced easily. Some examples are structured data like databases, and unstructured data such as word processing documents, spreadsheets, photos, videos, emails, etc. Typically, programs or system folders are not part of a data backup program. Installation discs, operating system discs, and registration information should be stored in a safe place.

      Data backup frequency depends on how often your organizational data changes.

      • Frequently changing data may need daily or hourly backups
      • Data that changes every few days might require a weekly or even monthly backup
      • For some data, a backup may need to be created each time it changes


      The challenge with unstructured data is that backing up unstructured data is not only time consuming but also very complex, with millions to billions of files of various sizes and types and growing at an astronomical rate, leaving enterprises to struggle with long backup windows, overlapping backup cycles, backup footprint sprawl, spiraling costs, and above all, vulnerable in the case of a disaster.

      Read the white paper: Rein in Storage and Backup Costs.

      Read the post: 5 Ways to Get to the Cloud Smarter and Faster

      Backing Up Unstructured Data First (Before Analysis) is Backwards

      Don’t backup data first. Know your data first to make smarter, cost-saving decisions. Start with the Komprise TCO calculator.

      Learn more about Komprise Analysis.


      Getting Started with Komprise:

    • Data Center Consolidation

      Data center consolidation is the process of merging or reducing the number of data centers that an organization operates. The consolidation is typically done in order to reduce costs, increase efficiency, and simplify management of the data center infrastructure.

      There are several steps involved in data center consolidation, including:

      • Assessing the current state of the data center environment, including the number and locations of data centers, the types of systems and applications being used, and the costs associated with operating and maintaining the infrastructure.
      • Developing a consolidation plan that outlines the goals, timelines, and resources needed for the project. This plan should include an analysis of the potential benefits and risks of consolidation, as well as a detailed roadmap for migrating applications and data to the new infrastructure.
      • Migrating applications and data migration to the consolidated data center(s). This may involve re-architecting applications to run in a virtualized environment or on cloud infrastructure.
      • Decommissioning or repurposing the legacy data center(s), including disposing of any equipment that is no longer needed.
      • Continuously monitoring and optimizing the consolidated data center infrastructure to ensure it remains efficient and cost-effective.

      Overall, data center consolidation can be a complex process that requires careful planning and execution. However, the benefits of consolidation can be significant, including lower costs, improved performance, and increased agility and flexibility for the organization.

      In 2023, Komprise summarized the following customers trends in unstructured data management and storage.

      • Simplifying infrastructure, getting rid of legacy apps and software and data center consolidation to support business growth and IT modernization.
      • Reducing IT spending by pivoting to more of an OPEX environment and by deleting data that is no longer needed to reduce data storage costs and complexity.
      • Managing research workflows and the full lifecycle of data: Examples include, from a major university: enabling users to share data between labs and send some data to the cloud for processing, then bring it back on-premises.
      • Using industry standards to move data easily between platforms.
      • Externalize (tier) data off NAS: IT and storage managers want to tier cold data from across the business to cheaper, secondary storage to save money and free up primary storage capacity.

      Learn more about Komprise Elastic Data Migration and read 5 Industry Case Studies.


      Getting Started with Komprise:

    • Data Center Emissions

      Data center emissions are the greenhouse gas (GHG) emissions produced by data centers during their operations. Data centers consume significant amounts of energy to power and cool their IT infrastructure, and this energy consumption often leads to the generation of carbon dioxide (CO2) and other GHG emissions. See Data Center Consolidation and Data Storage Costs.)

      What contributes to data center emissions?

      Traditionally the factors contributing to data center emissions have focused on the IT operations management of the physical location(s) such as:

      • Electricity Consumption: The primary source of emissions in data centers is the electricity consumed to power the IT equipment, cooling systems, lighting, and other supporting infrastructure. The majority of data centers rely on electricity generated from fossil fuel sources such as coal, natural gas, or oil, which results in the release of CO2 and other GHGs.
      • Cooling Systems: Data centers require cooling systems to maintain optimal operating temperatures for their IT equipment. Traditional cooling methods, such as air conditioning and refrigeration, consume significant amounts of energy, contributing to emissions. However, more energy-efficient cooling technologies, such as free cooling or liquid cooling, can help reduce emissions associated with cooling.
      • Backup Power: Data centers often rely on backup power systems, such as diesel generators, to ensure continuous operations in case of a power outage. The use of backup power systems can contribute to emissions, especially if they run on fossil fuels.
      • Infrastructure Efficiency: The energy efficiency of data center infrastructure plays a crucial role in emissions. Inefficient equipment, power distribution systems, and cooling mechanisms result in higher energy consumption and emissions. Implementing energy-efficient technologies and optimizing infrastructure can help reduce emissions.

      Strategies to mitigate data center emissions

      Again, with the focus on the physical operations of the data center, traditional strategies to reduce data center emissions include:

      • Energy Efficiency: Improving energy efficiency within data centers can significantly reduce emissions. This includes using energy-efficient IT equipment, optimizing cooling systems, implementing advanced power management techniques, and adopting server virtualization to maximize resource utilization.
      • Renewable Energy: Transitioning to renewable energy sources, such as solar, wind, or hydroelectric power, can help reduce the carbon footprint of data centers. Many organizations are investing in renewable energy projects or purchasing renewable energy credits to offset their electricity consumption.
      • Data Center Design: Implementing energy-efficient data center designs, including proper airflow management, efficient equipment layout, and insulation, can optimize energy usage and reduce emissions.
      • Lifecycle Management: Proper lifecycle management of IT equipment, including responsible disposal and recycling, can help minimize the environmental impact and emissions associated with data center operations.
      • Carbon Offsetting: Some organizations choose to offset their emissions by investing in carbon offset projects. These projects aim to reduce or remove CO2 from the atmosphere, such as through reforestation or renewable energy projects.

      Despite the shift to the cloud and increasingly hybrid and consolidated data centers, emissions continue to be a major a concern as the demand for lower cost data storage in the face of massive unstructured data growth as well as the demand processing power and high performance continues to grow. And while the industry continues to focus on sustainability by adopting energy-efficient practices, leveraging renewable energy sources, and seeking new ways to reduce emissions, the environmental impact of data center emissions cannot be denied.

      Data Management and Data Center Emissions

      It is only recently that enterprise IT organizations have been to focus on unstructured data management, as opposed to storage management, as a means to reduce data center emissions. In a post on the Azure Storage blog, the point is made about the true cost of traditional file data. The post points out that storage is only 25% of file data costs:

      When looking at the storage cost of file data, you need to consider that the cost of file data is at least three to four times higher than the cost of the file storage itself. The reason is that beyond storage, IT teams must also protect it with backups and replicate it for disaster recovery.

      The point is that with better data management practices, data growth will be managed, emissions (and costs) will be reduced.

      Read the eBook: 8 Ways to Reduce File Storage and Backup Costs.

      Getting Started with Komprise:

    • Data Classification

      Data classification is the process of organizing data into tiers of information for data organizational purposes.

      Data classification is essential to make data easy to find and retrieve so that your organization can optimize risk management, compliance, and legal requirements. Written guidelines are essential in order to define the categories and criteria to classify your organization’s data. It is also important to define the roles and responsibilities of employees in the data organization structure.

      When data classification procedures are established, security standards should also be established to address data lifecycle requirements. Classification should be simple so employees can easily comply with the standard.

      Examples of types of data classifications:

      • 1st Classification: Data that is free to share with the public
      • 2nd Classification: Internal data not intended for the public
      • 3rd Classification: Sensitive internal data that would negatively impact the organization if disclosed
      • 4th Classification: Highly sensitive data that could put an organization at risk

      Data classification is a complex process, but automated systems can help streamline this process. The enterprise must create the criteria for classification, outline the roles and responsibilities of employees to maintain the protocols, and implement proper security standards. Properly executed, data classification will provide a framework for the data storage, transmission and retrieval of data.

      Automation simplifies data classification by enabling you to dynamically set different filters and classification criteria when viewing data across your storage. For instance, if you wanted to classify all data belonging to users who are no longer at the company as “zombie data,” the Komprise Intelligent Data Management solution will aggregate files that fit into the zombie data criterion to help you quickly classify your data.

      Data Classification and Komprise Deep Analytics

      Komprise Deep Analytics gives data storage administrators and line of business users granular, flexible search capabilities and indexes data creating a Global File Index across file, object and cloud data storage spanning petabytes of unstructured data. Komprise Deep Analytics Actions uses these virtual datasets (see virtual data lake) for systematic, policy-driven data management actions that can feed your data pipelines.


      Getting Started with Komprise:

    • Data Governance

      What is data governance?

      Data governance refers to the management of the availability, security, usability, and integrity of data used in an enterprise. Data governance in an organization typically includes a governing council, a defined set of procedures, and a plan to execute those procedures.

      Data governance is not about allowing access to a few privileged users; instead, it should allow broad groups of users access with appropriate controls. Business and IT users have different needs; business users need secure access to shared data and IT needs to set policies around security and business practices. When done right, data governance allows any user access to data anytime, so the organization can run more efficiently, and users can manage their workload in a self-service manner.

      3 things to consider when developing a data governance strategy:

      Selecting a Data Governance Team
      • Balance IT and business leaders to get a broad view of the data and service needs
      • Start small – choose a small group to review existing data analytics
      Data Quality Strategy
      • Audit existing data to discover data types and how they are used
      • Define a process for new data sources to ensure quality and availability standards are met
      Data Security
      • Make sure data is classified so data requiring protection for legal or regulatory reasons meets those requirements
      • Implement policies that allow for different levels of access based on user privileges

      Komprise is not a data governance solution but we are part of an overall governance strategy as it relates to unstructured data management. With the Deep Analytics user profile, you can provide secure data access to specific users to search and tag file and object data so that it can then be incorporated into smart data migration and data mobility use cases, including Smart Data Workflows.

      Getting Started with Komprise:

    • Data Hoarding

      What is Data Hoarding?

      Data hoarding is now being recognized as a growing challenge in the technology world. Many IT teams are caught in an endless cycle of buying more data storage. Unstructured data is growing at record rates and this data is increasingly being stored across hybrid cloud infrastructure. This massive data growth and increased data mobility has only created more disconnected data silos. Just like hoarding has been recognized as a real problem in the real-world (see reality TV shows like Hoarders and Storage Wars), data hoarding refers to the practice of retaining large amounts of data that is no longer needed or is rarely used, for extended periods of time. This is a common problem in many organizations, where employees tend to save data out of habit, fear of losing it, or simply because they don’t know what to do with it.

      What is the impact of data hoarding?

      The impact of data hoarding is more significant than most people / organizations realize, including:

      • Increased costs: Storing large amounts of unnecessary data can be expensive, especially if the organization is using expensive storage solutions, such as high-end disk arrays or tape libraries.
      • Reduced efficiency: Hoarded data can slow down systems and applications, as well as increase the time required to complete backups and other data management tasks.
      • Compliance risks: Hoarded data can pose a risk to organizations in terms of compliance, as they may contain sensitive information that is subject to data privacy regulations.
      • Cybersecurity risks: Hoarded data can also pose a security risk, as it may contain sensitive information that could be targeted by cybercriminals or hackers.

      Stop Treating All Data the Same

      Sound familiar?

      • Cold data sits on expensive storage.
      • Everything gets replicated.
      • Everything gets backed up and backup windows are getting longer.
      • Costs are spiraling out of control.

      The IDC report, How to Manage Your Data Growth Smarter with Data Literacy noted:

      • 60% of the storage budget is not really spent on storage. It’s spent on secondary copies of data for data protection – backups, backup software licenses, replication, and disaster recovery.
      • 1/3 of IT organizations are spending most of their IT storage on secondary data.

      And with ransomware attacks on the rise, which increasingly target unstructured data, it’s increasingly important to find ways to manage, tier, migrate, replicate file data within tight IT budgets. Read the blog post: How to Protect File Data from Ransomware at 80% Lower Cost.

      Dealing with Data Hoarding

      To address the data hoarding challenge and establish an Intelligent Data Management strategy, IDC recommends the following:

      1. Focus less on finding alternatives to store data better/faster and focus more on finding intelligent alternatives to unstructured data management.
      2. Use modern, next-generation cloud data management technologies that are lightweight and non-intrusive, and that demonstrate powerful return on investment.
      3. Aim to deliver continuous insights as a service to business and achieve speed of intelligence for a competitive edge.


      Establish a Cold Data Storage Strategy

      One obvious strategy to deal with data hoarding is to define a cold data storage strategy and establish unstructured data management policies.

      Read this post to learn how to quantify the business value impact of Komprise Intelligent Data Management.


      Getting Started with Komprise:

    • Data Lake

      A data lake is data stored in its natural state. The term typically refers to unstructured data that is sitting on different storage environments and clouds. The data lake supports data of all types – for example, you may have videos, blogs, log files, seismic files and genomics data in a single data lake. You can think of each of your Network Attached Storage (NAS) devices as a data lake.

      One big challenge with data lakes is to comb through them and find the relevant data you need. With unstructured data, you may have billions of files strewn across different data lakes, and finding data that fits specific criteria can be like finding a needle in a haystack

      A virtual data lake is a collection of data that fits certain criteria – and as the name implies, it is virtual because the data is not moved. The data continues to reside in its original location, but the virtual data lake gives a discrete handle to manipulate that entire data set. The Komprise Global File Index can be considered to be a virtual data lake for file and object metadata.

      Some key aspects of data lakes – both physical and virtual:

      • Data Lakes Support a Variety of Data Formats: Data lakes are not restricted to data of any particular type.
      • Data Lakes Retain All Data: Even if you do a search and find some data that does not fit your criteria, the data is not deleted from the data lake. A virtual data lake provides a discrete handle to the subset of data across different storage silos that fits specific criteria, but nothing is moved or deleted.
      • Virtual Data Lakes Do Not Physically Move Data: Virtual data lakes do not physically move the data, but provide a virtual aggregation of all data that fits certain criteria. Deep Analytics can be used to specify criteria.


      Getting Started with Komprise:

    • Data Lakehouse

      Data Lakehouse is a term first coined by the co-founder and then CTO of Pentaho, James Dixon. And while both Amazon and Snowflake had already started using the term “lakehouse,” it wasn’t until Databricks really endorsed it in a January 30, 2020 blog post entitled “What is a Data Lakehouse?” that it received more mainstream attention (amongst data practitioners at least).

      You’ve heard of a Data Lake. You’ve heard of a Data Warehouse. Enter the Data Lakehouse.

      A data lakehouse is a modern data architecture that combines the benefits of data lakes and data warehouses. A data lake is a centralized repository that stores vast amounts of raw, unstructured, and semi-structured data, making it ideal for big data analytics and machine learning. A data warehouse, on the other hand, is designed to store structured data that has been organized for querying and analysis.

      A data lakehouse builds on key elements of these two approaches by providing a centralized platform for storing and processing large volumes of structured and unstructured data, while supporting real-time data analytics. It allows organizations to store all of their data in one place and perform interactive and ad-hoc analysis at scale, making it easier to derive insights from complex data sets. A data lakehouse typically uses modern (and often open source) technologies such as Apache Spark, Apache Arrow, to provide high-performance, scalable data processing.

      Who are the data lakehouse vendors?

      There are several vendors that offer data lakehouse solutions, including:

      • Amazon Web Services (AWS) with Amazon Lake Formation
      • Microsoft with Azure Synapse Analytics
      • Google with Google BigQuery Omni
      • Snowflake
      • Databricks
      • Cloudera with Cloudera Data Platform
      • Oracle with Oracle Autonomous Data Warehouse Cloud
      • IBM with IBM Cloud Pak for Data

      These vendors provide a range of services, from cloud-based data lakehouse solutions to on-premises solutions that can be deployed in an organization’s own data center. The choice of vendor will depend on the specific needs and requirements of the organization, such as: the size of the data sets, the required performance and scalability, the level of security and compliance needed and the overall budget.

      Komprise Smart Data Workflows is an automated process for all the steps required to find the right unstructured data across your data storage assets, tag and enrich the data, and send it to external tools such as a data lakehouse for analysis. Komprise makes it easier and more streamlined to find and prepare the right file and object data for analytics, AI, ML projects.

      Getting Started with Komprise:

    • Data Lifecycle Management

      Data Lifecycle Management (DLM) is the process of managing data throughout its entire lifecycle – from creation or acquisition to its deletion or archiving. As the name suggests, Data Lifecycle Management involves various stages and activities to ensure that data is effectively and securely managed throughout its existence. With unprecedented data growth in the enterprise, particularly of unstructured data, data hoarding has become a significant challenge to address. The right approach to unstructured data management and the recognition that all data cannot be treated the same has led to an increased focus on data governance and data lifecycle management, which typically includes:

      • Data Creation/Acquisition: This is the initial stage where data is generated or acquired by an organization through various sources such as data entry, sensor devices, APIs, data feeds, or third-party vendors.
      • Data Storage: After data is created or acquired, it needs to be stored in appropriate data repositories, such as databases, data warehouses, data lakes, or cloud storage systems. The storage infrastructure must be designed to accommodate the volume, velocity, and variety of the data being managed.
      • Data Processing and Analysis: Once the data is stored, it can be processed, transformed, and analyzed to derive insights and valuable information. This stage involves data cleansing, data integration, aggregation, and applying analytical techniques to extract meaningful patterns and trends. (Related areas: Data science, data lakes, data preparation, data warehousing.)
      • Data Usage and Presentation: After the data has been analyzed, it is utilized to make informed decisions, generate reports, create dashboards, or feed into applications for various business purposes. Increasingly feeding AI and ML is a use case here.
      • Data Archiving: As data ages or becomes less frequently used, it may be moved from active storage to long-term archival storage for compliance purposes or to free up resources on primary storage systems. (See hot data, cold data.)
      • Data Retention and Deletion: Organizations need to establish data retention policies that dictate how long data should be kept based on regulatory requirements or business needs. At the end of its useful life, data should be securely and permanently deleted to avoid any data privacy or security risks. (See Data Hoarding)
      • Data Security: Throughout the entire data lifecycle, data security measures must be implemented to protect data from unauthorized access, breaches, or other cybersecurity threats. (See Data Protection.)
      • Data Governance and Compliance: Data governance policies and procedures are put in place to ensure data quality, integrity, and compliance with relevant regulations and standards.
      • Data Backup and Disaster Recovery: Regular data backups and disaster recovery plans are essential to safeguard against data loss due to hardware failures, natural disasters, or cyber incidents.

      The right data lifecycle management (see also Information Lifecycle Management) strategy can help organizations maximize the value of their data, reduce data storage costs, ensure data integrity, comply with regulations, and maintain good data hygiene practices. It is particularly crucial in the context of artificial intelligence (AI), big data, data privacy, and data protection considerations.


      Getting Started with Komprise:

    • Data Literacy

      The ability to derive meaningful information from data. Komprise Data Analytics provides data literacy by showing how much data, what kind, who’s using it, how often—across all storage silos.

      Read the IDC InfoBrief: How to Manage Your Data Growth Smarter with Data Literacy.


      Getting Started with Komprise:

    • Data Management

      Data management is officially defined by DAMA International, the professional organization data management professionals, is:

      “Data Resource Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise.”

      Data management is the process of developing policies and procedures in order to effectively manage the information lifecycle needs of an enterprise. This includes identifying how data is acquired, validated, stored, protected, and processed. Data management policies should cover the entire lifecycle of the data, from creation to deletion.

      Due to the sheer volume of unstructured data, an unstructured data management plan is necessary for every organization. The numbers are staggering – for example, more data has been created in the past two years than in the entire previous history of the human race. Cloud data management is also a growing area of investment in the enterprise.

      Unstructured Data Management Report

      Getting Started with Komprise:

    • Data Management for AI

      Data Management for AI (artificial intelligence) is the process of gathering and storing data in a way that can be used by AI and machine learning models to generate insights, make predictions and drive research and innovation initiatives. AI models require significant amounts of data to train and improve their accuracy, most of which is unstructured data. However, this data is not simple rows and columns. It is files, objects, semi-structured and structured data, all of which can be messy and difficult to manage.

      In late 2022, Komprise cofounder and CEO Kumar Goswami noted:

      “Enterprises need to be ready for this wave of change and it starts by getting unstructured data prepped, as this data is the critical ingredient for AI/ML.”

      He published this post in early 2023: The AI/ML Revolution: Data Management Needs to Evolve, making the following recommendations:

      • Get full visibility so you can optimize and leverage your data
      • If you aren’t indexing your data today, that’s a problem
      • Make new uses of data while still being cost-efficient
      • Collaborate with departments on data needs

      SPOG: Data Management Requirements for AI

      With so much discussion about ChatGPT, generative AI, AI regulations and the opportunities and threats posed by rapid AI innovation, Komprise cofounder and COO Krishna Subramanian tied the discussion back to data management for AI summarizing the need for strategies and policies focused on data security, data privacy, data ownership, data lineage and data governance.


      AI needs unstructured data

      Getting Started with Komprise:

    • Data Management Policy

      What is a Data Management Policy?

      A data management policy addresses the operating policy that focuses on the management and governance of data assets, and is a cornerstone of governing enterprise data assets. This policy should be managed by a team within the organization that identifies how the policy is accessed and used, who enforces the data management policy, and how it is communicated to employees.

      It is recommended that an effective data management policy team include top executives to lead in order for governance and accountability to be enforced. In many organizations, the Chief Information Officer (CIO) and other senior management can demonstrate their understanding of the importance of data management by either authoring or supporting directives that will be used to govern and enforce data standards.

      Considerations to consider in a data management policy

      • Enterprise data is not owned by any individual or business unit, but is owned by the enterprise
      • Enterprise data must be safe
      • Enterprise data must be accessible to individuals within the organization
      • Metadata should be developed and utilized for all structured and unstructured data
      • Data owners should be accountable for enterprise data
      • Users should not have to worry about where data lives
      • Data should be accessible to users no matter where it resides

      Ultimately, a data management policy should guide your organization’s philosophy toward managing data as a valued enterprise asset. Watch the video: Intelligent Data Management: Policy-Based Automation

      Developing an unstructured data management policy

      It is important to develop enterprise-wide data management policies using a flexible governance framework that can adapt to unique business scenarios and requirements. Identify the right technologies following a proof of concept approach that supports specific risk management and compliance use cases. Tool proliferation is always a problem so look to consolidate and set standards that address end-to-end scenarios. Unstructured data management policies must address data storage, data migration, data tiering, data replication, data archiving and data lifecycle management of unstructured data (block, file, and object data stores) in addition to the semi-structured and structured data lakes, data warehouses and other so-called big-data repositories.


      Read the VentureBeat article: How to create data management policies for unstructured data.
      What is a Data Management Policy?

      A data management policy addresses the operating policy that focuses on the management and governance of data assets. The data management policy should contain all the guidelines and information necessary for governing enterprise data assets and should address the management of structured, semi-structured and unstructured data.

      What does a Data Management Policy contain?

      A comprehensive Data Management Policy should contain the following:

      • An inventory of the organization’s data assets
      • A strategy of effective management of the organization’s data assets
      • An appropriate level of security and protection for the data including details of which roles can access with data elements
      • Categorization of the different sensitivity and confidentiality levels of the data
      • The objectives for measuring expectations and success
      • Details of the laws and regulations that must be adhered to regarding the data program
      Data Management policy and procedures
      Firstly the business much select who should be part of the policy-making process. This should include legal, compliance and risk executives, security and IT leaders, business unit heads and the chief data officer or relevant alternative. Once the committee is selected, they should identify the risks associated with the organizations data and create a data management policy.

      Getting Started with Komprise:

    • Data Migration

      Data migration means many different things and there are many types of data migrations in the enterprise world. At it’s core, it is the process of selecting and moving data from one location to another. For this Glossary, we’re focused on Unstructured Data Migration, specifically file and object data. IT organizations use data migration tools to move data across different data storage systems and across different formats and protocols (SMB, NFS, S3, etc.).

      Data migrations often occur in the context of retiring a system and moving to a new system, or in the context of a cloud migration, or in the context of a modernization or upgrade strategy.

      When it comes to unstructured data migrations and migrating enterprise file data workloads to the cloud, data migrations can be laborious, error prone, manual, and time consuming. Migrating data may involve finding and moving billions of files (large and small), which can succumb to storage and network slowdowns or outages. Also, different file systems do not often preserve metadata in exactly the same way, so migrating data to a cloud environment without loss of fidelity and integrity can be a challenge.

      Two Data Migration Approaches


      Many organizations start here, thinking they’ll just migrate entire file shares and directories to the cloud. If this is your data migration plan, it’s important to use analytics to plan and migrate to reduce errors, ensure alignment and multi-storage visibility while minimizing cutover. With Komprise Elastic Data Migration, you can readily migrate from one primary vendor to another without rehydrating all the archived data, so migrations are cheaper and faster.

      Cloud Data Tiering as a First Step: Smart Data Migration

      Since a large percentage of file data is cold and has not been used in a year or more, tiering and archiving cold data is a smart first step – especially if you use Transparent Move Technology so users can access the files exactly as before. You can follow this up by migrating the remaining hot data to a performance cloud tier.

      Data Migration Questions

      Here are some questions that will help you determine the best file and object data migration strategy:

      • What data storage do we have and where?​ (primary storage, secondary storage)
      • What data sets are accessed most frequently (hot) and less frequently (cold)?​
      • What types of data and files do we have and which are taking up the most storage (image files, video, audio files, sensor data, etc.)?​
      • What is the cost of storing these different file types today? How does this align with the budget and projected growth?​
      • Which types of files should be stored at a higher security level? (PII or IP data? Mission-critical projects?)​
      • Are we complying with regulations and internal policies with our unstructured data management practices?
      • What constraints do my network and environment pose and how do I avoid surprises during migrations?
      • Do we have the best possible strategy in place for WAN acceleration, such as Komprise Hypertransfer for Elastic Data Migration.


      Getting Started with Komprise:

    • Data Protection

      Data protection is used to describe both data backup and disaster recovery. A quality data protection strategy should automate the movement of critical data to online and offline storage and include a comprehensive strategy for valuing, classifying, and protecting data as to protect these assets from user errors, malware and viruses, machine failure, or facility outages/disruptions.

      Data protection storage technologies include tape backup, which copies data to a physical tape cartridge, or cloud backup, which copies data to the cloud, and mirroring, which replicates a website or files to a secondary location. These processes can be automated and policies assigned to the data, allowing for accurate, faster data recovery.

      Data protection should always be applied to all forms of data within an organization, in order to protect the integrity of the data, protect from corruption or errors, and ensuring privacy of the data. When classifying data, policies should be established to identify different levels of security, from least secure (data that anyone can see) to most secure (data that if released, would put the organization at risk).


      Getting Started with Komprise:

    • Data Retention

      Data retention is the term used for storing and keeping data for a specific period of time based on legal, regulatory, business, or operational requirements. While for many organizations there is overlap with the term data hoarding, data retention involves defining policies and procedures to determine how long different types of data (the majority of which is unstructured data) should be retained, as well as ensuring compliance with applicable laws and regulations regarding data storage and privacy.

      Key points about data retention:

      • Legal and Regulatory Requirements: Many industries and jurisdictions have specific regulations or laws that dictate how long certain types of data must be retained. These requirements aim to ensure compliance, support legal obligations, facilitate audits, or provide evidence in case of disputes or investigations. Examples include financial records, healthcare data, customer information, and communication records.
      • Business and Operational Needs: Organizations establish data retention policies to address their internal needs, such as operational efficiency, historical analysis, reporting, or knowledge management. Retaining data for a certain period allows organizations to reference past information, track trends, support decision-making, or fulfill business requirements.
      • Retention Periods: The duration for which data should be retained varies depending on factors such as data type, industry regulations, legal requirements, business practices, and risk considerations. Some data may only need to be retained for a short period, while other data, especially for compliance-related purposes, may need to be retained for several years or even indefinitely.
      • Data Lifecycle: Data retention is part of the broader data lifecycle management process. It involves stages such as data creation, storage, usage, archival, and ultimately disposal. Retention policies define how long data should be kept at each stage and provide guidelines for when and how data should be archived or deleted.
      • Data Security and Privacy: During the retention period, it is essential to ensure the security and privacy of the stored data. Adequate security measures, access controls, and data protection mechanisms should be in place to protect the data from unauthorized access, loss, or breach.
      • Disposal and Data Destruction: At the end of the retention period, data should be disposed of properly. Secure data disposal methods, including data destruction techniques like shredding or data wiping, should be employed to ensure that sensitive or confidential information cannot be recovered or accessed.
      • Legal Holds and Exceptions: In some cases, legal holds or litigation may require data retention beyond the initially defined periods. Legal holds suspend the regular data disposal practices to preserve relevant data for legal proceedings or investigations. Learn more about Smart Data Workflow use cases, including legal hold.


      It is crucial for organizations to establish clear data retention policies, regularly review and update them to align with changing requirements, and ensure compliance with applicable laws and regulations. Consulting legal and compliance professionals can help organizations determine the appropriate retention periods and develop robust data retention practices. Policy-based unstructured data management and mobility should be a core component of your enterprise data retention strategy.

      Getting Started with Komprise:

    • Data Retrieval

      Data retrieval refers to the process of accessing and retrieving data from a database or data storage system. Data retrieval is possible using various techniques and tools, such as database querying, data mining, and data warehousing. The specific techniques and tools used will depend on the type of data being retrieved, along with the requirements and goals of the organization.

      Some benefits of effective data retrieval include:

      • Improved data access: By providing quick and easy access to data, organizations can improve their overall data management processes and make better use of their existing data.
      • Better decision making: By providing access to up-to-date and accurate information, data retrieval can help organizations to make better decisions and improve their overall performance.
      • Better customer insights: By retrieving and analyzing customer data, organizations can gain valuable insights into customer behavior and preferences, so they can improve customer relationships and drive business growth.

      Cloud Data Retrieval

      There are several challenges associated with retrieving data from the cloud, including:

      • Network Latency: Retrieving data from a remote server can result in significant latency, especially if the data is large or the network is congested.
      • Bandwidth Limitations: Bandwidth limitations can limit the speed at which data can be retrieved from the cloud.
      • Data Security: Ensuring the security and privacy of data stored in the cloud can be challenging, especially for sensitive data.
      • Data Compliance: Organizations must ensure that their data retrieval practices comply with relevant regulations and standards, such as data privacy laws and industry standards.
      • Data Availability: In some cases, cloud data may not be available due to network outages, server downtime, or other technical issues.
      • Cloud Costs: Retrieving large amounts of data from the cloud can be expensive, especially if the data is stored in a high-performance tier.
      • Complexity: Interacting with cloud data storage systems can be complex and requires a certain level of technical expertise.

      Cloud Data Retrieval and Egress Costs

      Egress fees refer to the costs associated with transferring data from a cloud storage service to an external location or to another cloud provider. Many cloud service providers charge fees for data egress, as transferring large amounts of data can put a strain on their network and infrastructure. The cost of egress is usually based on the amount of data transferred, the distance of the transfer, and the speed of the transfer.

      It is important for organizations to understand their cloud service provider’s data egress policies and fees, as well as their data transfer needs, to avoid unexpected costs. Organizations can minimize egress costs by compressing data, reducing the amount of data transferred, or storing data in the same geographic region as their computing resources.

      The Benefits of Smart File Data Migration

      A smart data migration strategy for enterprise file data means an analytics-first approach ensuring you know which data can migrate, to which class and tier, and which data should stay on-premises in your hybrid cloud storage infrastructure. With Komprise, you always have native data access, which not only removes end-user disruption, but also reduces egress costs and the need for rehydration and accelerates innovation in the cloud.


      Getting Started with Komprise:

    • Data Services

      Data services describes a range of services typically provided by enterprise IT operations teams or shared services teams, such as: data processing, data integration, data security, data reduction, data protection, data storage, and unstructured data management. Data services is a broad term that can overlap with analytics services, cloud services or professional services, but is tied to financial operations (FinOps) goals. Data services are essential for data-heavy enterprise organizations that need to manage, process, and analyze large amounts of (mostly unstructured) data to gain insights and make better business decisions.

      Examples of data services include:

      • Departmental-Archiving-WP-THUMB-2-768x512Data storage services: The storage of data in various forms, including files, databases, and cloud storage. The shift to Storage as a Service (STaaS) is part of a data services strategy. Read the white paper: Getting Departments to Care About Storage Savings.
      • Data management services: The management of data throughout its lifecycle, including data quality management, data governance, data classification is critical to lower costs and grow data value. Data management services include analysis and line of business reporting into data storage usage and costs for showback along with  data migration, data tiering, data replication and deletion.
      • Data processing services: This entails the processing of data through various algorithms and techniques, including data analytics, machine learning, and artificial intelligence.
      • Data integration services: This is the integration of data from multiple sources (ETL/ELT) to create a single, unified view of the data (usually for analytics) as well as real-time application of data between systems (EAI, ESB, streaming).

      From Storage Services to Data Services

      In VMblog predictions post: Unstructured Data Management Predictions for 2023: Data Insights and Automation take Center Stage, Komprise cofounder and COO Krishna Subramanian noted that enterprises are moving away from managing storage to managing data services:

      “Storage teams have traditionally measured infrastructure metrics for capacity and performance such as latency, I/O operations per second (IOPS) and throughput. But given the massive  growth of unstructured data, data-centric metrics are becoming paramount as enterprises move away from managing storage to managing data services in hybrid cloud infrastructure. New data management metrics look at usage indicators such as top data owners, percentage of “cold” files which haven’t been accessed in over a year, most common file size and type, and financial operations metrics such as storage costs per department, storage costs per vendor per TB, percentage of backups reduced, rate of data growth, chargeback metrics and more.”

      In the same post she highlighted the changing role of storage administrators:

      The storage architect/engineer will evolve to incorporate data services

      “We’ll see more experienced individuals in these roles move on to cloud architect and other engineering roles while IT generalists/junior cloud engineers inherit their responsibilities. This is a challenging time for IT organizations in a hybrid model as there is still significant NAS expertise needed. Either way, the IT employees managing the storage function will need new skills beyond managing the storage hardware. These individuals must understand the concept of data services-including facilitating secure, reliable governance and access to data and making data searchable and available to business stakeholders for applications such as cloud-based machine learning and data lakes. The new storage architect will frequently analyze and interpret data characteristics, developing data management plans which factor in cost savings strategies and business demands to create new value from data. This individual will interact regularly with departments to create and execute ongoing data management processes and plans.”

      In a Solutions Review post: 2023 Expert Data Management Best Practices & Predictions, Komprise cofounder and CEO Kumar Goswami noted:

      “IT organizations must better understand data to improve migrations and gain maximum ROI from cloud, meet compliance requirements, deliver data services to departments, and to facilitate new value generation from data.”

      He went on to say:

      “To keep up with ever-changing data services demands from the business, IT will implement collaborative processes with stakeholders across many different departments such as finance, marketing, legal, research, HR. Data workflow automation will support a variety of use cases from governance and compliance to cost savings to big data analytics.”

      In the 2022 Strategic Roadmap for Storage, Gartner noted (subscription required):

      I&O leaders must implement intelligent data services infrastructure powered by software-defined storage and hybrid cloud IT operations….Integration of data services to the hybrid cloud platform is among the top enterprise challenges to address the need for seamless data services across the edge, the core data center and public clouds.

      Read the article: Unstructured Data Growth and AI Give Rise to Data Services

      Getting Started with Komprise:

    • Data Sprawl

      What is Data Sprawl?

      Data sprawl describes the staggering amount of unstructured data produced by enterprises worldwide every day; with new devices, including enterprise and mobile applications added to a network, it is estimated data sprawl to be 40% year over year, into the next decade.

      Given this growth in data sprawl, data security is imperative, as it can lead to enormous problems for organizations, as well as its employees and customers. In today’s fast-paced world, organizations must carefully consider how to best manage the precious information it holds.

      Organizations experiencing unstructured data sprawl need to secure all of their endpoints. Security is critical. Addressing data security as well as remote physical devices ensure organizations are in compliance with internal and external regulations.

      As the amount of security threats mount, it is critical that data sprawl is addressed. Taking the right steps to ensure data sprawl is controlled, via policies and procedures within an organization, means safeguarding not only internal data, but also critical customer data.

      Organizations should develop solid practices that may have been dismissed in the past. Left unchecked, control of an organization’s unstructured data will continue to manifest itself in hidden costs and limited options. With a little evaluation and planning, it is an aspect of your network that can be improved significantly and will pay off long term.

      Analyzing and Managing Unstructured Data: Getting Sprawl (and Costs) Under Control

      According to this Geekwire article, Gartner estimates that unstructured data represents an astounding 80 to 90% of all new enterprise data, and it’s growing 3X faster than structured data. Komprise Intelligent Data Management rapidly analyzes file ad object unstructured data in-place across multi-vendor storage to provide aggregate analytics (e.g., how much data, how much is hot, how much is cold, what types, top users, etc.) as well as a Global File Index across cloud and on-prem environments. The Komprise Global File Index is highly efficient and scalable to handle billions of files, exabytes of data without the scalability issues of using a central database or any other centralized architectures. Customers can build queries using Komprise Deep Analytics to find the precise subset of data they need through any combination of metadata and tags, and then move, copy and tier that data using Deep Analytics Actions. Komprise combines in-place analytics with data movement and on-going data management to provide a closed-loop system that is intelligent and adapts to a customer’s unique needs. The functionality is also available via API.

      Tackling Data Sprawl with Komprise Analysis


      Komprise Analysis provides consistent unified insights into unstructured data across many vendors’ storage and cloud platforms. Key metrics include data volume, data growth rates, where data is stored, top owners, top file types/sizes and time of last access. Komprise can create cost models based on different storage targets and tiering plan that will show.

      Getting Started with Komprise:

    • Data Storage

      What is Data Storage?

      Data storage refers both to the methods of transferring digital information from the source (users, applications, sensors) via protocols or APIs and to the destination; physical storage media such as magnetic or solid-state disks, tape, or optical. Data storage is pervasive:  implemented in enterprise data centers, cloud providers, and consumer technology such as laptops, and phones. 

      From genomics and medical imaging to streaming video, electric cars, IoT at the edge and user generated data, unstructured data growth is exploding. Enterprise IT organizations are looking to new cloud and hybrid cloud strategies to manage costs and investing in unstructured data management and cloud data migration and cloud data management technologies and strategies to reduce data storage costs and while maximizing data value.


      What are the different types of data storage protocols?

      File Data Storage: File storage records data to files that are organized in folders, and the folders are organized under a hierarchy of directories and subdirectories. For example, a text file stored to your home directory on your laptop. File data is typically used for collaboration and shared access.

      Examples of File Storage

      NAS Network Attached Storage, Network File System NFS, and Server Message Block SMB

      File Storage Vendor Solutions

      NetApp ONTAP, Dell/EMC PowerScale (Isilon), Qumulo, Microsoft Windows Server, Pure FlashBlade, Amazon FSx, Azure Files

      Block Data Storage

      Typically used in servers and workstations where data is being written directly to physical media (HDD or SSD) in chunks or blocks. In contrast to file, block data is typically dedicated for access by a single application. Block storage is often used for the most performance intensive applications.

      • Examples of Block Data Storage: Direct Attached Storage DAS, Storage Attached Network SAN, iSCSI, NVME
      • Block Storage Vendor Solutions: Pure FlashArray, Dell/EMC VMAX, NetApp ONTAP and E-series, HDS
      Object Storage

      Also known as object-based storage or cloud storage, is a way of addressing and manipulating data storage as objects. In contrast to file storage, object data is stored in a flat namespace. Object storage was designed for use in massive repositories and is accessed over the HTTP protocol as a REST API.

      • Examples of Object Storage: AWS S3, Azure Blob, Google Cloud Storage, Cloud Data Management Interface (CDMI)
      • Object Storage Vendors: AWS, Azure, Google, Wasabi, Cloudian, NetApp, Dell/EMC, Scality
      NDMP (Network Data Management Protocol)

      Storage protocol that allows file servers and backup applications to communicate directly to a network-attached tape device for backup or recovery operations.

      What are types of physical storage media?

      • Hard Disk Drive (HDD): Disk based storage, used for high density data storage. Data is written to a magnetic layer of spinning disk.
      • Solid State Drive (SSD): Also known as flash. Silicone replaces the spinning disk component of HDD to achieve higher performance and smaller form factor.
      • Tape: Data is written to a ribbon of magnetic material in a cartridge. Used strictly for backup and archive, tape’s slow performance is off set by low cost, high levels of density, and the ability to be stored offline. 
      • Optical Storage: In contrast to magnetic storage data is recorded optically to media such as CD and DVD disks. Optical storage is used for durable, long term, off-line, archival storage. 

      What is Primary Storage?

      Primary storage is used for active read and write data sets where high performance is critical. SSD or flash media with the highest level of performance is the ideal storage media for primary storage. While less typical HDD is also used as primary storage where lower cost and storage density is the key factor.

      What is Secondary Storage?

      Also referred to as active archive, secondary storage is used for less frequently accessed data sets. While any protocol and media can be used for secondary storage HDD with NAS and Object are the most common choices. Use cases for secondary storage is data tiering and backup / data protection applications.

      Read the white paper: Block-Level vs. File Level Tiering

      What is Data Storage?

      Data storage refers both to the methods of transferring digital information from the source (users, applications, sensors) via protocols or APIs and to the destination; physical storage media such as magnetic or solid-state disks, tape, or optical.

      What is Block Level Data Storage?

      Mainly used in servers and workstations where data is being written directly to physical media (HDD or SSD) in chunks or blocks. As opposed to file level data storage, block level data storage is mostly dedicated for access by a single application. Block storage uses either direct attached storage (DAS), or data transfer protocols Fiber Channel (FC) or iSCSI (Internet Small Computer Systems Interface) via a storage area network (SAN).

      What is Data Lake Storage in Azure?

      Data Lake Storage in Azure from Microsoft is a fully managed scalable system based on a secure cloud platform that provides industry-standard, cost-effective storage for big data analytics.

      Getting Started with Komprise:

    • Data Storage Costs

      Data storage costs are the expenses associated with storing and maintaining data in various forms of storage media, such as hard drives, solid-state drives (SSDs), cloud storage, and tape storage. These costs can be influenced by a variety of factors, including the size of the data, the type of storage media used, the frequency of data access, and the level of redundancy required. As the amount of unstructured data generated continues to grow, the cost of storing it remains a significant consideration for many organizations. In fact, according to the Komprise 2022 State of Unstructured Data Management Report, the majority of enterprise IT organizations are spending over 30% of their budget on data storage, backups and disaster recovery—similar to 2021. This is why shifting from storage management to storage-agnostic data management continues to be a topic of conversation for enterprise IT leaders.

      Unstructured Data Management

      Cloud Data Storage Costs

      Cloud data storage costs refer to the expenses incurred for storing data on cloud storage platforms provided by companies like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). In addition to the points above about data storage costs (amount of data stored and frequency of data access) in the cloud the level of durability and availability required are also factors when it comes to cloud storage costs. Cloud data storage providers typically charge based on the amount of data stored per unit of time, and additional fees may be incurred for data retrieval, data transfer, and data processing. Many cloud storage providers offer different storage tiers with varying levels of performance and cost, allowing customers to choose the option that best fits their budget and performance needs. With the right cloud data management strategy, cloud storage can be more cost-effective than traditional hardware-centric on-premises storage, especially for organizations with large amounts of data and high storage needs.

      Managing Data Storage Costs

      Managing data storage costs involves making informed decisions (and the right investment strategies) about how to store, access, and use data in a cost-effective manner. Here are some strategies for managing data storage costs:

      • Data archiving: Archiving infrequently accessed data to lower cost storage options, such as object storage or tape, can help reduce storage costs.
      • Data tiering: Using different storage tiers for different types of data based on their access frequency and importance can help optimize costs.
      • Compression and deduplication: A well known data storage technique, compressing data and deduplicating redundant data can help reduce the amount of storage needed and lower costs.
      • Cloud file storage: Using cloud storage can be more cost-effective than traditional on-premises storage, especially for organizations with large amounts of data and high storage needs.
      • Data lifecycle management (aka Information Lifecycle Management): Regularly reviewing and purging unneeded data can help control storage costs over time.
      • Cost monitoring and optimization (see cloud cost optimization): Regularly monitoring and analyzing data storage costs and usage patterns can help identify opportunities for cost optimization.

      By using a combination of these strategies, organizations can effectively manage their data storage costs and ensure that they are using their data storage resources efficiently. Additionally, organizations can negotiate with data storage providers to secure better pricing and take advantage of cost-saving opportunities like bulk purchasing or long-term contracts.

      Stop Overspending on Data Storage with Komprise

      The blog post How Storage Teams Use Komprise Deep Analytics summarizes a number of strategies storage teams use Komprise Intelligent Data Management to deliver greater data storage cost savings and unstructured data value to the business, including:

      • Business unit metrics with interactive dashboards
      • Business-unit data tiering, retention and deletion
      • Identifying and deleting duplicates
      • Mobilizing specific data sets for third-party tools
      • Using data tags from on-premises sources in the cloud

      In the blog post Quantifying the Business Value of Komprise Intelligent Data Management, we review a storage cost savings analysis that saves customers an average 57% of overall data storage costs and over $2.6M+ annually. In addition to cost savings, benefits include:

      Plan Future Data Storage Purchases with Visibility and Insight

      With an analytics-first approach, Komprise delivers visibility into how data is growing and being used across a customer’s data storage silos – on-premises and in the cloud. Data storage administrators no longer have to make critical storage capacity planning decisions in the dark and now can understand how much more storage will be needed, when and how to streamline purchases during planning.

      Optimize Data Storage, Backup, and DR Footprint

      Komprise reduces the amount of data stored on Tier 1 NAS, as well as the amount of actively managed data—so customers can shrink backups, reduce backup licensing costs, and reduce DR costs.

      Faster Cloud Data Migrations

      Auto parallelize at every level to maximize performance, minimize network usage to migrate efficiently over WANs, and migrate more than 25 times faster than generic tools across heterogeneous cloud and storage with Elastic Data Migration.


      Reduced Datacenter Footprint

      Komprise moves and copies data to secondary storage to help reduce on-premises data center costs, based on customizable data management policies.

      Risk Mitigation

      Since Komprise works across storage vendors and technologies to provide native access without lock-in, organizations reduce the risk of reliance on any one storage vendor.


      Getting Started with Komprise:

    • Data Tagging

      What is data tagging?

      Data tagging is the process of adding metadata to your file data in the form of key value pairs. These values give context to your data, so that others can easily find it in search and execute actions on it, such as move to confinement or a cloud-based data lake. Data tagging is valuable for research queries and analytics projects or to comply with regulations and policies.

      How does Komprise data tagging work?

      Komprise-Automated-Data-Tagging-blog-THUMBUsers, such as data owners, can apply tags to groups of files and tags can also be applied programmatically by analytics applications via API. In the Komprise Deep Analytics interface, users can query the Global File Index and find the data for tagging. This is done by creating a Komprise Plan that will invoke the text search function to inspect and tag the selected files. The ability to use Komprise Intelligent Data Management to search, find, apply tags and then take action makes it possible for customers to get faster value from enriched data sets.

      Tagging and Smart Data Workflows


      Komprise Smart Data Workflows automate unstructured data discovery, data mobility and the delivery of data services.

      • Define custom query to find specific data set.
      • Analyze and tag data sets with additional metadata
      • Move only the tagged data for analytics, AI/ML, etc.
      • Move to a lower-cost data storage tier after analysis



      Getting Started with Komprise:

    • Data Tiering

      Data Tiering refers to a technique of moving less frequently used data, also known as cold data, to cheaper levels of storage or tiers. The term “data tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems. See also cloud tiering and choices for cloud data tiering.


      Data Tiering Cuts Costs Because 70%+ of Data is Cold

      As data grows, storage costs are escalating. It is easy to think the solution is more efficient storage. But the real cause of storage costs is poor data management. Over 70% of data is cold and has not been accessed in months, yet it sits on expensive storage and consumes the same backup resources as hot data. As a result, data storage costs are rising, backups are slow, recovery is unreliable, and the sheer bulk of this data makes it difficult to leverage new options like Flash and Cloud.

      Data Tiering Was Initially Used within a Storage Array

      Data Tiering was initially a technique used by storage systems to reduce the cost of data storage by tiering cold data within the storage array to cheaper but less performant options – for example, moving data that has not been touched in a year or more from an expensive Flash tier to a low-cost SATA disk tier.

      Typical storage tiers within a storage array include:
      • Flash or SSD: A high-performance storage class but also very expensive. Flash is usually used on smaller data sets that are being actively used and require the highest performance.
      • SAS Disks: Usually the workhorse of a storage system, they are moderately good at performance but more expensive than SATA disks.
      • SATA Disks: Usually the lowest price-point for disks but not as performant as SAS disks.
      • Secondary Storage, often Object Storage: Usually a good choice for capacity storage – to store large volumes of cool data that is not as frequently accessed, at a much lower cost.


      Cloud Data Tiering is now Popular

      Increasingly, customers are looking at another option – tiering or archiving data to a public cloud.

      • Public Cloud Storage: Public clouds currently have a mix of object and file storage options. The object storage classes such as Amazon S3 and Azure Blob (Azure Storage) provide tremendous cost efficiency and all the benefits of object storage without the headaches of setup and management.

      Tiering and archiving less frequently used data or cold data to public cloud storage classes is now more popular. This is because customers can leverage the lower cost storage classes within the cloud to keep the cold data and promote them to the higher cost storage classes when needed. For example, data can be archived or tiered from on-premises NAS to Amazon S3 Infrequent Access or Amazon Glacier for low ongoing costs, and then promoted to Amazon EFS or FSX when you want to operate on it and need performance.

      But in order to get this level of flexibility, and to ensure you’re not treating the cloud as just a cheap storage locker, data that is tiered to the cloud needs to be accessible natively in the cloud without requiring third-party software. This requires file-tiering, not block-tiering.

      Block Tiering Creates Unnecessary Costs and Lock-In

      Block-level tiering was first introduced as a technique within a storage array to make the storage box more efficient by leveraging a mix of technologies such as more expensive SAS disks as well as cheaper SATA disks.

      Block tiering breaks a file into various blocks – metadata blocks that contain information about the file, and data blocks that are chunks of the original file. Block-tiering or Block-level tiering moves less used cold blocks to lower, less expensive tiers, while hot blocks and metadata are typically retained in the higher, faster, and more expensive storage tiers.

      Block tiering is a technique used within the storage operating system or filesystem and is proprietary. Storage vendors offer block tiering as a way to reduce the cost of their storage environment. Many storage vendors are now expanding block tiering to move data to the public cloud or on-premises object storage.

      But, since block tiering (often called CloudPools – examples are NetApp FabricPool and Dell EMC Isilon CloudPools) is done inside the storage operating system as a proprietary solution, it has several limitations when it comes to efficiency of reuse and efficiency of storage savings. Firstly, with block tiering, the proprietary storage filesystem must be involved in all data access since it retains the metadata and has the “map” to putting the file together from the various blocks. This also means that the cold blocks that are moved to a lower tier or the cloud cannot be directly accessed from the new location without involving the proprietary filesystem because the cloud does not have the metadata map and the other data blocks and the file context and attributes to put the file together. So, block tiering is a proprietary approach that often results in unnecessary rehydration of the data and treats the cloud as a cheap storage locker rather than as a powerful way to use data when needed.

      The only way to access data in the cloud is to run the proprietary storage filesystem in the cloud which adds to costs. Also, many third-party applications such as backup software that operate at a file level require the cold blocks to be brought back or rehydrated, which defeats the purpose of tiering to a lower cost storage and erodes the potential savings. For more details, read the white paper: Block vs. File-Level Tiering and Archiving.

      Know Your Cloud Tiering Choices


      File Tiering Maximizes Savings and Eliminates Lock-In

      File-tiering is an advanced modern technology that uses standard protocols to move the entire file along with its metadata in a non-proprietary fashion to the secondary tier or cloud. File tiering is harder to build but better for customers because it eliminates vendor lock-in and maximizes savings. Whether files have POSIX-based Access Control Lists (ACLs) or NTFS extended attributes, all this metadata along with the file itself is fully tiered or archived to the secondary tier and stored in a non-proprietary format. This ensures that the entire data can be brought back as a file when needed. File tiering does not just move the file, but it also moves the attributes and security permissions and ACLS along with the file and maintains full file fidelity even when you are moving a file to a different storage architecture such as object storage or cloud. This ensures that applications and users can use the moved file from the original location, and they can directly open the file natively in the secondary location or cloud without requiring any third-party software or storage operating system.

      Since file tiering maintains full file fidelity and native access based on standards at every tier, it also means that third party applications can access the moved data without requiring any agents or proprietary software. This ensures that savings are maximized since backup software and other third -arty applications can access moved data without rehydrating or bringing the file back to the original location. It also ensures that the cloud can be used to run valuable applications such as compliance search or big data analytics on the trove of tiered and archived data without requiring any third-party software or additional costs.

      File-tiering is an advanced technique for archiving and cloud tiering that maximizes savings and breaks vendor lock-in.

      Data Tiering Can Cut 70%+ Storage and Backup Costs When Done Right

      In summary, data tiering is an efficient solution to cut storage and backup costs because it tiers or archives cold, unused files to a lower-cost storage class, either on-premises or in the cloud. However, to maximize the savings, data tiering needs to be done at the file level, not block level. Block-level tiering creates lock-in and erodes much of the cost savings because it requires unnecessary rehydration of the data. File tiering maximizes savings and preserves flexibility by enabling data to be used directly in the cloud without lock-in.

      Why Komprise is the easy, fast, no lock-in path to the cloud for file and object data.


      Getting Started with Komprise:

    • Data Transfer

      Data transfer is the term used to describe the movement of data from one location or system to another. It involves transmitting data over a network or transferring it from one data storage device to another. Data transfer can occur within a local network, between different networks, or across the internet. See Komprise Hypertransfer for an example of high-speed file migration transfer.

      Common Data Transfer Methods

      • Local Data Transfer: This involves transferring data within a local network or between devices connected to the same network. Local data transfer can be accomplished through wired connections like Ethernet or USB cables, or wirelessly using technologies like Wi-Fi or Bluetooth.
      • File Transfer Protocol (FTP): FTP is a standard network protocol used for transferring files between a client and a server on a computer network. It enables the exchange of files over the internet using dedicated FTP clients or through web browsers with built-in FTP capabilities.
      • Cloud Data Transfer: Cloud data transfer refers to the movement of data to and from cloud storage services like Amazon S3, Google Cloud Storage, or Microsoft Azure. It involves uploading data from local storage to the cloud or downloading data from the cloud to local storage. Cloud providers offer various methods, such as APIs, SDKs, command-line tools, and web interfaces, to facilitate data transfer to and from their platforms.
      • Data Replication and Synchronization: Data replication involves creating and maintaining duplicate copies of data across multiple systems or storage locations. It ensures data redundancy and availability. Synchronization, on the other hand, involves keeping data consistent and up-to-date across different devices or storage locations by transferring only the changed or modified portions of the data.
      • Data Transfer over the Internet: Transferring data over the internet involves transmitting data packets between devices or networks using standard internet protocols like TCP/IP. This can include methods like email attachments, cloud storage services, peer-to-peer file sharing, or direct data transfers between client and server applications.


      In our Tips for a Clean Cloud File Migration series of webinars we discussed the importance of performance tuning, topology and network to a successful cloud migration initiative.

      When transferring data, factors such as the size of the data, network bandwidth, latency, security considerations, and the transfer method or protocol being used can impact the speed and efficiency of the transfer. Security measures, such as encrypting sensitive data during transfer and verifying the integrity of transferred data to prevent unauthorized access or data corruption are also important considerations.

      Komprise Hypertransfer

      Komprise Hypertransfer for Elastic Data Migration creates dedicated virtual channels across the WAN to accelerate cloud data migrations. By establishing dedicated channels to send data, Komprise Hypertransfer minimizes the WAN roundtrips, which mitigates SMB protocol chattiness and dramatically improves data transfer rates.Tests done using a dataset dominated by small files shows Komprise accelerates cloud data migration 25x faster than other alternatives.


      Getting Started with Komprise:

    • Data Virtualization

      Data virtualization delivers a unified, simplified view of an organization’s data that can be accessed anytime. It integrates data from multiple sources, to create a single data layer to support multiple layers and users. The result is faster access to this data, providing instant access, any way you want it.

      Data virtualization involves abstracting, transforming, federating and delivering data from disparate sources. This allows users to access the applications without having to know their exact location.

      Advantages to data virtualization:

      • An organization can gain business insights by leveraging all data
      • They can become aware of analytics and business intelligence
      • Data virtualization can streamline an organization’s data management approach, which reduces complexity and saves money

      Data virtualization involves three key steps. First, data virtualization software is installed on-premise or in the cloud, which collects data from production sources and stays synchronized as those sources change over time. Next, administrators are able to secure, archive, replicate, and transform data using the data virtualization platform as a single point of control. Last, it allows users to provision virtual copies of the data that consume significantly less storage than physical copies.

      Data virtualization use cases:

      • Application development
      • Backup and disaster recovery
      • Datacenter migration
      • Test data management
      • Packaged application projects

      Getting Started with Komprise:

    • Deduplication

      Deduplication, also known as data deduplication, is a technique used to eliminate redundant or duplicate data within a dataset or data storage system. It is primarily employed to optimize storage space, reduce data backup sizes, and improve storage efficiency. Deduplication identifies and removes duplicate data chunks, storing only a single instance of each unique data segment, and references the duplicate instances to the single stored copy.

      Duplicate Data Identification

      Deduplication algorithms analyze data at a block or chunk level to identify redundant patterns. The algorithm compares incoming data chunks with existing stored chunks to determine if they are duplicates.

      Chunking and Fingerprinting

      Data is typically divided into fixed-size or variable-sized chunks for deduplication purposes. Each chunk is assigned a unique identifier or fingerprint, which can be computed using hash functions like SHA-1 or SHA-256. Fingerprinting enables quick identification of duplicate chunks without needing to compare the actual data contents.

      Inline and Post-Process Deduplication

      Deduplication can be performed inline, as data is being written or ingested into a system, or as a post-process after data is stored. Inline deduplication reduces storage requirements at the time of data ingestion, while post-process deduplication analyzes existing data periodically to remove duplicates.

      Deduplication Methods

      There are different deduplication methods based on the scope and granularity of duplicate detection. These include file-level deduplication (eliminating duplicates across entire files), block-level deduplication (eliminating duplicates at a smaller block level), and variable-size chunking deduplication (eliminating duplicates at a variable-sized chunk level).

      Deduplication Ratios

      Deduplication ratios indicate the level of space savings achieved through deduplication. Higher ratios signify more redundant or duplicate data within the dataset. The deduplication ratio is calculated by dividing the original data size by the size of the deduplicated data.

      Backup and Storage Optimization

      Deduplication is commonly used in backup and storage systems to reduce storage requirements and optimize data transfer and backup times. By removing duplicate data, only unique data chunks need to be stored or transferred, resulting in significant storage and bandwidth savings.

      Deduplication Challenges and Considerations

      Deduplication algorithms should be efficient to handle large datasets without excessive computational overhead. Data integrity and reliability are critical, ensuring that deduplicated data can be accurately reconstructed. Additionally, deduplication requires careful consideration of security, privacy, and legal compliance when handling sensitive or regulated data.

      Deduplication is widely used in various storage systems, backup solutions, and cloud storage environments. It helps organizations save storage costs, improve data transfer efficiency, and streamline data management processes by eliminating redundant copies of data.

      Deduplication History

      Companies such as Data Domain (acquired by EMC) and their Data Domain Deduplication Storage Systems, introduced commercial deduplication products in the mid-2000s, which gained significant attention and adoption. These systems played a crucial role in popularizing deduplication as a key technology for data storage optimization and backup solutions. Since then, numerous vendors and researchers have contributed to the development and improvement of deduplication techniques, including variations such as inline deduplication, post-process deduplication, and source-based deduplication. Deduplication has become a standard feature in many storage systems, backup solutions, and data management platforms, providing significant benefits in terms of storage efficiency and data optimization.

      Getting Started with Komprise:

    • Deep Analytics

      What is Deep Analytics?

      Deep analytics is the process of applying data mining and data processing techniques to analyze and find large amounts of data in a form that is useful and beneficial for new applications. Deep analytics can apply to both structured and unstructured data.

      In the context of unstructured data and unstructured data management, Komprise Deep Analytics is the process of examining file and object metadata (both standard and extended) across billions of files to find data that fits specific criteria. A petabyte of unstructured data can be a few billion files. Analyzing petabytes of data typically involves analyzing tens to hundreds of billions of files. Because analysis of such large workloads can require distribution over a farm of processing units, deep analytics is often associated with scale-out distributed computing, cloud computing, distributed search, and metadata analytics.

      Deep analytics of unstructured file and object data requires efficient indexing and search of files and objects across a distributed farm. Financial services, genomics, research and exploration, biomedical, and pharmaceutical are some of the early adopters of Komprise Deep Analytics, which is powered by a Global File Index medata catalog. In recent years, enterprises have started to show interest in deep analytics as the amount of corporate unstructured data has increased, and with it, the desire to extract value from the data.


      Deep analytics enables additional use cases such as Big Data Analytics, Artificial Intelligence and Machine Learning.

      When the result of a deep analytics query is a virtual data lake, which we call the Global File Index, data does not have to be moved or disrupted from its original destination to enable reuse. This is an ideal scenario to rapidly leverage deep analytics without disruption since data can be pretty heavy to move.

      Learn more about Komprise Deep Analytics.

      Learn more about Deep Analytics with Actions.


      Read the blog post: How Storage Teams Use Deep Analytics

      Getting Started with Komprise:

    • Dell PowerScale

      Dell PowerScale is the name of Dell Technologies scale-out network-attached storage (NAS) solution. According to Dell, PowerScale is designed to provide high-performance storage for unstructured data workloads and is well-suited for demanding file and object storage requirements. In 2020, Dell rebranded many of the acquired EMC technologies such as EMC Isilon to PowerScale.

      PowerScale is used in a variety of industries, including media and entertainment, healthcare, research, and financial services, where large-scale data storage, high performance, and data-intensive workloads are critical.

      Whether your use case is cloud tiering, cloud data migration or optimizing performance and reducing storage costs, with Komprise for Dell PowerScale technologies you are able to:

      Learn more about Komprise for Dell EMC.

      Learn more about Smart Migration from PowerScale Isilon.

      Getting Started with Komprise:

    • Dell PowerScale SmartPools

      Dell PowerScale SmartPools is the name of the feature of Dell EMC network attached storage (NAS) used for storage tiering.

      See Storage Pools and CloudPools.

      This technology was originally built for Isilon Tiering to extend Isilon storage to the cloud, optimize storage costs, handle fluctuating workloads and leverage the benefits of cloud storage while maintaining the performance and features of on-premises Isilon storage.

      Read: What you need to know before jumping into the cloud pool.


      Getting Started with Komprise:

    • Digital Business

      A digital business is one that uses technology as an advantage in its internal and external operations.

      Information technology has changed the infrastructure and operation of businesses from the time the Internet became widely available to businesses and individuals. This transformation has profoundly changed the way businesses conduct their day-to-day operations. This has maximized the benefits of data assets and technology-focused initiatives.

      This digital transformation has had a profound impact on businesses; accelerating business activities and processes to fully leverage opportunities in a strategic way. A digital business takes advantage of this fully so to not be disrupted and to thrive in this era. C-Level staff needs to help their organizations seize opportunities while mitigating risks.

      This technology mindset has become standard in even the most traditional of industries, making a digital business strategy imperative for storing and analyzing data to gain a competitive advantage over the competition. The introduction of cloud computing and SaaS delivery models means that internal processes can be easily managed through a wide choice of applications, giving organizations the flexibility to chose, and change software as the businesses grows and changes.

      A digital business also has seen a shift in purchasing power; individual departments now push for the applications that will best suit their needs, rather than relying on IT to drive change.

      Unstructured Data Management is a Digital Business Priority

      The Komprise 2022 State of Unstructured Data Management Report found that data storage costs comprise over 30% of enterprise IT budgets. This is why the right unstructured data management strategy has become an essential component of a digital business strategy.

      Unstructured Data Management

      Unstructured data management is about being able to realize business outcomes from analytics through data movement, extraction, and value. Komprise provides a storage-independent way to manage data no matter where it lives so a digital business can get value from unstructured data from every tier. Unlike storage tiering or data backup solutions that move blocks of data and lock customers into proprietary file systems, Komprise Intelligent Data Management moves the entire file intact and enables customers to directly leverage native services at every tier without going through Komprise or their primary file system. This is key to a seamless user experience because users in a modern digital enterprise transparently access data from their original file system while also being able to build new applications in the cloud. Furthermore, user transparency, powered by patented Transparent Move Technology, makes it possible for IT teams to deploy transparent tiering company wide. Without this transparency, IT would need user and/or departmental approval and this essentially is a major roadblock that prevents any large scale tiering.

      Getting Started with Komprise:

    • Digital Pathology Data Management

      According to the Digital Pathology Association:

       “Digital pathology is a dynamic, image-based environment that enables the acquisition, management and interpretation of pathology information generated from a digitized glass slide.”

      Healthcare organizations have shifted to digital media for medical imaging. Digital pathology, digital PACS and VNA systems are all generating and now storing petabytes of medical imaging data—lab slides, X-rays, MRIs, CT scans and more. These ever-expanding datasets are pushing the limitations of data storage systems and challenging IT department’s ability to effectively manage data. And with increasing regulations, healthcare providers typically must retain medical imaging files for many years. In addition to compliance requirements, clinical researchers may also need access to the data indefinitely. They also typically need access to the unstructured data immediately. The potential future value of this ever-expanding data repository must be weighed against the growing financial and overall unstructured data management costs.

      The Digital Pathology Data Management Challenge

      Medical-Imaging-White-Paper-SOCIAL-3-768x402Data center storage for large image files is expensive – typically costing millions a year for some organizations on expensive NAS devices. Not only is NAS expensive, but its data must also be secured, replicated and backed up, which typically triples the costs. Meanwhile, in most cases, imaging data is rarely accessed after a few days or weeks. To get greater flexibility and manage data storage costs, healthcare organizations are adopting unstructured data management software to tier cold medical imaging data out of expensive storage to cost-effective environments such as the cloud. Data management decisions can be difficult internally with politics, vendor relationships and long-standing institutional perspectives. Health systems are handling sensitive patient information and tolerance for downtime is usually quite low.

      There are many benefits from augmenting medical imaging solutions with data management software that transparently tiers cold data from your data storage and backups.

      Komprise has many customers in the healthcare industry dealing with multiple petabytes of file and object data.

      Learn more.

      Getting Started with Komprise:

    • Direct Data Access

      Direct data access is the ability to directly access your data whether on-premises, in the cloud, or a hybrid environment without needing to rehydrate.

      The patented Komprise Transparent Move Technology™ (TMT) tiers file data workloads to a target without using any agents or stubs, allowing users to still access files natively from the original source as if they had never moved. Known as file and object duality, with Komprise users access files as native objects without getting in front of hot, mission-critical data.

      Native Data Access definition.


      Getting Started with Komprise:

    • Director (Komprise Director)

      The Komprise Director is the administrative console of the Komprise distributed architecture that runs as a cloud service or on-premises. Read the white paper: Komprise Intelligent Data Management Architecture Overview or one of the Komprise TechKrunch videos to learn more.

      Learn more about the Komprise architecture.


      Getting Started with Komprise:

    • Disaster Recovery

      Disaster recovery refers to security planning to protect an organization from the effects of a disaster – such as a cyber attack or equipment failure. A properly constructed disaster recovery plan will allow an organization to maintain or quickly resume mission critical functions following a disaster.

      The disaster recovery plan includes policies and testing, and may involve a separate physical site for restoring operations. This preparation needs to be taken very seriously, and will involve a significant investment of time and money to ensure minimal losses in the event of a disaster.

      Control measures are steps that can reduce or eliminate various threats for organizations. Different types of measures can be included in disaster recovery plan. There are three types of disaster recovery control measures that should be considered:

      1. Preventive measures – Intended to prevent a disaster from occurring
      2. Detective measures – Intended to detect unwanted events
      3. Corrective measures – The plan to restore systems after a disaster has occurred.

      A quality disaster recovery plan requires these policies be documented and tested regularly. In some cases, organizations outsource disaster recovery to an outsourced provider instead of using their own remote facility, which can save time and money. This solution has become increasingly more popular with the rise in cloud computing.

      Read the case study:
      Leading Idaho Health System Selects Komprise to Right-Place Data and Bolster Disaster Recovery


      Getting Started with Komprise:

    • Dynamic Data Analytics

      Komprise unstructured data analytics allows organizations to analyze data across all storage to know how much exists, what kind, who’s using it, and how fast it’s growing. “What if” data scenarios can be run based on various policies to instantly see capacity and data storage cost savings, enabling informed, optimal unstructured data management planning decisions without risk.

      Learn more about Komprise Analysis.

      Learn more about Komprise Deep Analytics.


      Getting Started with Komprise:

  • E
    • Egress Costs

      Egress costs are the network fees most cloud providers charge to move your data out of the cloud. Most allow you to move your data into the cloud for free (ingress). It’s important to understand ingress and egress fees when moving data to the cloud. If you have moved data to cold storage in the cloud for archiving purposes but users recall it more than expected, you may incur hefty egress costs. Egress fees also happen when data is pulled out of cloud storage for use in analytics applications and to transfer data to another cloud region or cloud service.

      In the post 5 Tips to Optimize Your Unstructured Data, a key benefit of embracing open, standards-based unstructured data management is that organizations can do whatever they need to do with their file and object data data without paying licensing penalties and costs, such as for a third-party cloud file system or unnecessary cloud-egress fees. Komprise moves and manages unstructured data in native format in each tier, which means you can directly access the data and use all the cloud data services on your data without having to pay a data management or storage vendor. Avoiding these costs, including egress costs, is a priority for IT leaders surveyed by Komprise. Read the report: State of Unstructured Data Management.

      To learn more about Egress Costs read the New Stack article: Why Data Egress in the Cloud is Expensive.

      To learn more about right approach to cloud data migrations and data management visit: Smart Data Migration.

      The Benefits of Cloud Native Access

      Cloud native is a way to move data to the cloud without lock in, which means that your data is no longer tied to the file system Komprise-Cloud-Native-Access-Webinar-blog-SOCIAL-1-768x402from which it was originally served.

      In this webinar, Komprise leaders review the importance of cloud native data access and maximizing the potential of your data in terms of access, efficiency and data services. When you move data in cloud native format, your users should be able to access the data not only as a file, but also as a native object—which is necessary for leveraging cloud-native analytics and other services. Access to your data should not have to go through your file storage layer, as this incurs licensing fees and requires adequate capacity.

      Read the blog post: Why Cloud Native Unstructured Data Access Matters

      Getting Started with Komprise:

    • Elastic Block Store (EBS)

      Elastic Block Store (EBS) is a block-level storage service provided by Amazon Web Services (AWS) that is designed to be used with Amazon Elastic Compute Cloud (EC2) instances. EBS provides durable, persistent, and high-performance block storage volumes that can be attached to EC2 instances as virtual disks.

      Amazon EBS characteristics

      • Block-Level Storage: EBS provides block-level storage volumes that can be formatted and used as virtual disks by EC2 instances. These volumes can be used for a wide range of applications and databases that require persistent and reliable storage.
      • Performance Options: EBS offers different volume types to cater to various performance requirements. These include General Purpose SSD (gp2), Provisioned IOPS SSD (io1/io2), Throughput Optimized HDD (st1), and Cold HDD (sc1). Each volume type is optimized for specific use cases in terms of performance, capacity, and cost.
      • Elasticity and Scalability: EBS volumes can be created and attached to EC2 instances on the fly, providing elasticity and flexibility in storage provisioning. Volumes can be easily resized to meet changing capacity needs without requiring downtime or data migration.
      • Data Durability and Availability: EBS volumes are designed for durability and availability. Data stored in EBS volumes is automatically replicated within an Availability Zone (AZ) to protect against hardware failures. For additional data protection, EBS snapshots can be created and stored in Amazon Simple Storage Service (S3).
      • Snapshots and Backup: EBS allows you to create point-in-time snapshots of your volumes, which are stored in Amazon S3. These snapshots serve as backups and can be used to restore data or create new volumes. Snapshots are incremental, capturing only the changed data, which helps reduce backup costs and storage requirements.
      • Encryption and Security: EBS volumes support encryption at rest using AWS Key Management Service (KMS) keys. This helps protect data stored on EBS volumes and ensures compliance with data security requirements.
      • Performance Monitoring: AWS provides tools and metrics to monitor the performance of EBS volumes, including metrics for throughput, latency, and IOPS. This allows you to optimize performance and troubleshoot any performance-related issues.

      Common EBS scenarios

      Common EBS scenarios include hosting databases, running applications, storing data files, and building scalable and highly available architectures on AWS. It integrates seamlessly with other AWS services, making it a versatile and integral part of the AWS ecosystem.

      EBS features, including performance characteristics, and pricing details are regularly being updated so refer to the official AWS documentation or consult with AWS for the most up-to-date information and guidelines.

      Learn more about Komprise for AWS.


      Getting Started with Komprise:

    • Elastic Data Migration

      What is Elastic Data Migration?

      Data migration is the process of moving data (eg files, objects) from one storage environment to another, but Elastic Data Migration is a high-performance migration solution from Komprise using a parallelized, multi-processing, multi-threaded approach that speeds NAS-to-NAS and NAS-to-cloud migrations in a fraction of the traditional time and cost.

      Standard Data Migration

      • NAS Data Migration – move files from a Network Attached Storage (NAS) to another NAS. The NAS environments may be on-premises or in the cloud (Cloud NAS)
      • S3 Data Migration – move objects from an object storage or cloud to another object storage or cloud

      Data migrations can occur over a local network (LAN) or when going to the cloud over the internet (WAN). As a result, migrations can be impacted by network latencies and network outages.

      Data migration software needs to address these issues to make data migrations efficient, reliable, and simple, especially when dealing with NAS and S3 data since these data sizes can be in petabytes and involve billions of files.


      Elastic Data Migration

      Elastic Data Migration makes its orders of magnitude faster than normal data migrations. It leverages parallelism at multiple levels to deliver 27 times faster performance than NFS alternatives and 25 times faster for SMB protocol performance.

      • Parallelism of the Komprise scale-out architecture – Komprise distributes the data migration work across multiple Komprise Observer VMs so they run in parallel.
      • Parallelism of sources – When migrating multiple shares, Komprise breaks them up across multiple Observers to leverage the inherent parallelism of the sources
      • Parallelism of data set – Komprise optimizes for all the inherent parallelism available in the data set across multiple directories, folders, etc to speed up data migrations
      • Big files vs small files – Komprise analyzes the data set before migrating it so it learns from the nature of the data – if the data set has a lot of small files, Komprise adjusts its migration approach to reduce the overhead of moving small files. This AI driven approach delivers greater speeds without human intervention.
      • Protocol level optimizations – Komprise optimizes data at the protocol level (eg NFS, SMB) so the chattiness of the protocol can be minimized

      All of these improvements deliver substantially higher performance than standard data migration. When an enterprise is looking to migrate large production data sets quickly, without errors, and without disruption to user productivity, Komprise Elastic Data Migration delivers a fast, reliable, and cost-efficient migration solution.


      Komprise Elastic Data Migration Architecture

      What Elastic Data Migration for NAS and Cloud provides

      Komprise Elastic Data Migration provides high-performance data migration at scale, solving critical issues that IT professionals face with these migrations. Komprise makes it possible to easily run, monitor, and manage hundreds of migrations simultaneously. Unlike most other migration utilities, Komprise also provides analytics along with migration to provide insight into the data being migrated, which allows for better migration planning.


      Fast, painless file and object migrations with parallelized, optimized data migration:

      • Parallelism at every level:
        • Leverages parallelism of storage, data hierarchy and files
        • High performance multi-threading and automatic division of a migration task across machines
      • Network efficient: Adjusts for high-latency networks by reducing round trips
      • Protocol efficient: optimized NFS handling to eliminate unnecessary protocol chatter
      • High Fidelity: Does MD5 checksums of each file to ensure full integrity of data transfer
      • Intuitive Dashboards and API: Manage hundreds of migrations seamlessly with intuitive UI and API
      • Greater speed and reliability
      • Analytics with migration for data insights
      • Ongoing value


      Getting Started with Komprise:

    • EMC

      EMC (formerly known as EMC Corporation and now known as Dell EMC) was a multinational technology company that specialized in data storage. The company was founded in 1979 and played a significant role in the development of the modern data storage industry. In 2016, Dell Technologies acquired EMC Corporation, forming Dell EMC, which is now a subsidiary of Dell Technologies. Dell EMC continues to provide a wide range of storage solutions, leveraging the technologies and expertise of both Dell and EMC.

      EMC offers a wide range of products and services, including storage systems, software-defined storage, data protection solutions, content management, and information governance solutions.

      Notable EMC products and technologies

      • Symmetrix: EMC Symmetrix is a high-end enterprise storage platform that offers scalability, high availability, and advanced data protection features. It has been widely used in mission-critical environments.
      • VMAX: EMC VMAX is a family of enterprise storage arrays designed to deliver high performance, scalability, and availability. It provides features like dynamic virtualization, automated tiering, and replication capabilities.
      • Isilon (now part of the Dell PowerScale product line: Acquired by EMC, Isilon is a scale-out network-attached storage (NAS) platform that allows organizations to efficiently store, manage, and analyze large amounts of unstructured data. It is commonly used in industries such as media and entertainment, life sciences, and research.
      • Data Domain: Also acquired by EMC, Data Domain is a deduplication storage system that reduces storage requirements by eliminating redundant data. It is used for backup and recovery purposes, providing efficient and cost-effective data protection.
      • XtremIO: XtremIO is an all-flash storage array designed to deliver high performance and low latency for demanding workloads. It leverages inline data deduplication and compression to optimize storage efficiency.
      • RSA Security: RSA Security, a division of EMC (now part of the Dell family of brands), focuses on providing security solutions and products, including identity and access management, encryption, and cybersecurity solutions.

      For an up to date history of EMC, be sure to check out Wikipedia. Learn more at Dell Storage.

      Komprise and Dell EMC.

      Komprise for Isilon migrations.

      Getting Started with Komprise:

    • EMC PowerScale

      EMC PowerScale (see Dell PowerScale). PowerScale is the name of Dell Technologies scale-out network-attached storage (NAS) solution.

      Learn more about Komprise for Dell EMC.

      Learn more about Smart Migration from PowerScale Isilon.

      Getting Started with Komprise:

  • F
    • FabricPool

      What is NetApp FabricPool?

      FabricPool is a NetApp storage technology that enables automated tiering of data from an all-flash appliance to low-cost object storage tiers either on or off premises. This technology is a form of storage pools which are collections of storage volumes exported to a shared storage environment.

      Read more about storage pools.

      Read the blog post: What you need to know before jumping into the cloud tiering pool


      Download the white paper: Cloud Tiering: Storage-Based vs Gateways vs File-Based: Which is Better and Why?

      Learn more about the Komprise path to the cloud for file and object data.

      Getting Started with Komprise:

    • File

      A file is a named collection of data that is stored on a computer or other storage device. It represents a unit of information, such as a document, image, video, audio recording, program, or any other type of digital content. Files are organized within a file system, which provides a hierarchical structure for storing and retrieving data.

      File Formats

      Files are typically associated with specific file formats that define the structure and organization of the data they contain. Common file formats include .txt (plain text), .docx (Microsoft Word document), .jpg (JPEG image), .mp3 (MP3 audio), .mp4 (MP4 video), and many more. Each file format has its own specifications and is designed to be interpreted or processed by specific software or applications.

      File Extensions

      File extensions are a part of the file name that indicates the file format or type. They usually consist of a period (.) followed by a few letters or a combination of letters and numbers. For example, a file named “document.txt” has a “.txt” extension, indicating that it is a plain text file.

      File Properties and Metadata

      Files can have associated properties and metadata that provide additional information about the file. This may include attributes such as file size, creation date, modification date, author, permissions, and more. File properties and metadata help users and operating systems manage and organize files effectively.

      File Operations

      Files can be manipulated through various file operations, such as creating, opening, reading, writing, modifying, moving, copying, and deleting. These operations are typically performed using file management functions or commands provided by the operating system or specific software applications.

      File Systems

      Files are stored within a file system, which is responsible for managing and organizing the storage of files on a storage device, such as a hard drive, solid-state drive, or network attached storage (NAS). File systems provide a directory structure to organize files into folders or directories and enable efficient retrieval and storage of data.

      File Compression

      Files can be compressed to reduce their size, making them occupy less storage space and facilitating faster file transfers. Compression algorithms, such as ZIP or GZIP, are used to compress files by eliminating redundancy or encoding data more efficiently. Compressed files need to be decompressed or extracted to restore them to their original form.

      Files are fundamental units of data in computing and are essential for storing and accessing various types of digital content. They enable the creation, sharing, and management of information in a structured and organized manner.

      File Tiering

      File Data Management. Read the white paper: Block-Level versus File-Level Tiering


      File Protocols

      There are many standards and protocols that define how files are transferred, shared, and accessed. File protocol examples include:

      • File Transfer Protocol (FTP): FTP is one of the earliest and most widely used protocols for transferring files between computers over a network. It provides a simple way to upload, download, and manage files on a remote server.
      • Secure File Transfer Protocol (SFTP): SFTP is an extension of the SSH protocol and provides a secure method for transferring files over a network. It offers encryption and authentication, ensuring that data is protected during transit.
      • File Transfer Protocol over Secure Shell (FTP over SSH or FTPS): FTPS combines the FTP protocol with SSL or TLS encryption to provide secure file transfers. It adds a layer of security to the traditional FTP protocol.
      • Hypertext Transfer Protocol (HTTP): While primarily used for transferring web pages, HTTP can also be used to transfer files. When files are accessed via HTTP, they can be downloaded directly from a web server using a web browser or other HTTP client.
      • Hypertext Transfer Protocol Secure (HTTPS): HTTPS is the secure version of HTTP. It uses SSL or TLS encryption to secure the communication between a web server and a client, ensuring that files transferred over HTTPS are protected from eavesdropping and tampering.
      • Network File System (NFS): NFS is a distributed file system protocol that allows files to be accessed and shared among multiple computers in a network. It enables clients to mount remote file systems and access them as if they were local.
      • Server Message Block (SMB) / Common Internet File System (CIFS): SMB, also known as CIFS, is a network file sharing protocol commonly used in Windows environments. It allows computers to share files, printers, and other resources over a network.
      • Web Distributed Authoring and Versioning (WebDAV): WebDAV extends the HTTP protocol to support remote file management. It enables users to collaboratively edit and manage files stored on a remote server, providing features like file locking, versioning, and metadata management.

      These are just a few examples of file protocols used for transferring, sharing, and accessing files over networks. Each protocol has its own specifications and features, catering to specific use cases and requirements for secure and efficient file operations.

      Komprise Intelligent Data Management is built on open standards. In a 2021 interview, CEO and cofounder Kumar Goswami noted:

      We built the product on open standards, so the customer is not locked into our solution. This was risky, because it meant that a customer could kick us out at any time. This is contradictory in the data storage industry where the popular mindset is: “own the data, own the customer.” Our approach forces us to deliver white glove treatment to ensure we’re really solving a customer’s problem. In the process, this has made Komprise stickier with our customers. The way I see it is, if you have data you need Komprise.

      Getting Started with Komprise:

    • File Archiving

      File archiving is the process of preserving digital files for long-term data storage and retrieval. The goal of file archiving is to retain important files and documents in a secure, easily accessible, and cost-effective manner, while freeing up space on primary storage systems.

      Manual file data management, backup and restore solutions, and dedicated file archiving systems are three ways to archive files. Manual file management moves files to a secondary storage location, such as a network share or external hard drive. Backup and restore solutions preserve files by creating snapshots of the data at regular intervals; snapshots can restore data in the event of data loss or corruption. Dedicated file archiving systems are specialized software solutions that are designed specifically for file archiving and provide features such as indexing, searching, and data retention policies.

      File Archiving Challenges

      File archiving reduces the risk of data loss, improves regulatory compliance, and reduces the costs associated with primary storage. Yet file archiving can present several challenges, including:

      • Data Storage Costs: Storing large volumes of data for a long time can be expensive, especially if the data is stored on traditional storage solutions, such as tapes or hard disk drives.
      • Scalability: As data volumes continue to grow, archiving solutions must be able to meet the increasing demand for storage capacity.
      • Data Retrieval: Archived files are difficult to locate and retrieve if they are not properly indexed or if the index becomes corrupted.
      • Data Retention: Organizations must ensure that their archiving solutions meet regulatory requirements for data retention, including data privacy and security laws.
      • Data Integrity: Archived files must be preserved in their original format and remain readable over time, which requires proper data preservation and data migration strategies.
      • Data migration: As archiving systems age or become obsolete, IT must migrate data to new systems, in particular cloud data migration, which can be time-consuming and complex.
      • Integration with other systems: Archiving solutions must integrate with other systems, such as backup and restore solutions, to ensure streamlined access.

      Standards-based Transparent Data Archiving


      A true transparent data archiving solution creates literally no disruption, and that’s only achievable with a standards-based approach. Komprise Intelligent Data Management is the only standards-based transparent data archiving solution that uses Transparent Move Technology™ (TMT), which uses symbolic inks instead of proprietary stubs.

      True transparency that users won’t notice

      When a file is archived using TMT, it’s replaced by a symbolic link, which is a standard file system construct available in NFS, SMB, object store file systems. The symbolic link, which retains the same attributes as the original file, points to the Komprise Cloud File System (KCFS), and when a user clicks on it, the file system on the primary storage forwards the request to KCFS, which maps the file from the secondary storage where the file actually resides. (An eye blink takes longer.) This approach seamlessly bridges file and object storage systems so files can be archived to highly cost-efficient object-based solutions without losing file access.

      Learn more about Komprise TMT for File Archiving

      Getting Started with Komprise:

    • File Data Management

      File data management is the process of organizing, storing, and retrieving digital files in an efficient and secure manner. This can include tasks such as:

      • Naming files in a consistent and descriptive manner
      • Creating folders and sub-folders to categorize and store files
      • Regularly backing up important files to prevent data loss
      • Purging old or unnecessary files to free up storage space
      • Using appropriate software tools to manage, search and retrieve files

      Effective file data management helps improve productivity and organization, and reduces the risk of data loss or corruption. It is a critical aspect of overall data management, especially in businesses and organizations where large amounts of data are generated and stored on a regular basis.

      File Data Management Challenges

      Because we’re talking about unstructured data, file data management can present a number of challenges, including:

      • Data Growth: As more and more data is generated and stored, it can become difficult to manage and organize effectively. The majority is unstructured data.
      • Data Duplication: Duplicate files can lead to confusion, waste storage space and make it harder to find the most up-to-date version of a file.
      • Data Security: Protecting sensitive information from unauthorized access or cyberattacks is a major concern in file data management. (Read about cyber resiliency and saving on ransomware production.)
      • Data Loss: Accidentally deleting or losing files can result in significant data loss and potential productivity loss.
      • Compliance: Certain industries and organizations may have regulatory requirements for file data management, such as retention policies and data privacy laws.
      • Integration with Other Systems: Integrating file data management systems with other applications, such as email, CRM, and collaboration platforms, can be complex and time-consuming.
      • Scalability: As the amount of data grows, the file data management system must be able to scale to meet the demands of the organization.
      • Compatibility: Ensuring that files can be opened and used by multiple users and systems can be a challenge, especially with different file formats and software versions.

      These challenges can be addressed through the use of appropriate software tools, best practices for file data management, and regular reviews and updates to the file data management policies.

      Komprise_ArchitectureOverview_WhitePaperthumbKomprise File Data Management

      Komprise Intelligent Data Management has been designed from the ground-up to simplify file data management and put customers in control of unstructured data, no matter where data lives. Analytics-first approach, Komprise works across file and object storage, across cloud and on-premises, and across data storage and data backup architectures to deliver a consistent way to manage data. With Komprise you get instant insight into all of your unstructured data—wherever it resides. See patterns, make decisions, make moves, and save money—all without compromising user access to any data. Komprise puts you in control of your data while simplifying file data management by creating a lightweight management plane across all your data storage silos without getting in the path of data access.

      Getting Started with Komprise:

    • File Data Migration

      File data migration or file migration is the process of transferring data stored in files, such as text documents, images, audio and video files, spreadsheets, and other types of data, from one system to another. IT organizations move data for many reasons including for system upgrades, data center relocations, during mergers and acquisitions, and when acquiring new data storage platforms.

      File data migration involves several steps, such as data extraction, data transformation, data loading, data verification, and data archiving. It’s important to ensure that all the data is accurately and securely transferred to the new system, while minimizing any disruptions to business operations and preserving the integrity of the data.

      File data migration can be complex and time-consuming, especially for organizations with large volumes of data, multiple file formats, and strict security and compliance requirements. To ensure a successful migration, organizations typically use specialized tools and services, such as data migration software, cloud data migration services, and managed data migration services.

      Komprise File Data Migration

      Komprise Elastic Data Migration is a fast, predictable and cost-efficient file data migration software solution. Elastic Data Migration is included in the Komprise Intelligent Data Management platform or is available standalone.

      Komprise Hypertransfer for Elastic Data Migration accelerates file data transfer to the cloud while strengthening cloud security. Komprise Hypertransfer optimizes cloud data migration performance by minimizing the WAN roundtrips using dedicated channels to send data, mitigating SMB protocol issues.komprise-elastic-data-migration-page-promo

      File Data Migration to the Cloud Considerations

      Increasingly enterprise IT organizations are looking to migrate file data workloads to the cloud. (Read the State of Unstructured Data Management report to review data storage and cloud data migration trends.) This ITPro-Today article reviews some key considerations to know first before a file data migration initiative:

      • What data do I have and where is it stored?
      • What data sets are accessed most frequently (a.k.a. hot data)?
      • What data sets are rarely accessed (a.k.a. cold data)?
      • Who uses the data currently and is there value in enabling collaboration outside of your organization?
      • What data/files haven’t been accessed for more than 3-5 years and should be considered for deep archival storage or confinement and deletion?
      • What types of files do we have and which comprise the most storage: a.k.a. image files, video or audio files, sensor data, text data.
      • What is the cost of storing these different file types?
      • Which types of files should be stored in a higher security level — a.k.a. those containing PII or IP data or belonging to mission-critical projects?
      • Are we complying with regulations and internal policies with our data management practices?

      This video discussion reviews cloud file data migration considerations:

      In this Data on the Move discussion we interview Benjamin Henry, Customer Success Architect at Komprise.



      Getting Started with Komprise:

    • File Data Ransomware

      What is File Data Ransomware?

      This is a ransomware attack targeting file data. 

      File data can be generated from users as well as machines. From genomics and medical imaging, streaming video, electric car data, and IoT products, all industries are generating vast amounts of unstructured file data, and increasingly enterprises are migrating file workloads to the cloud. File data can be petabytes of data and billions of files, so migrating this much unstructured data to the cloud takes time and can be disruptive. Cloud data migrations require proper planning to ensure minimal disruption and unintended costs.

      There is a growing recognition in the importance of having a layered protection strategy in place against potential file data ransomware attacks. Upwards of 80% of data today is unstructured file data, so IT organizations cannot afford to leave file data unprotected from ransomware. Early detection of ransomware will deliver the best outcome, but ransomware attacks are constantly evolving. Detection is not always foolproof and can be difficult. Investing in ways to recover data if you do get attacked by ransomware and establishing an immutable copy of data in a separate location separate from data storage and backups is the best way to recover data in the event of a ransomware attack. 

      But keeping multiple copies of data can get prohibitively expensive. Read the blog: How to Protect File Data from Ransomware at 80% Lower Cost

      Learn more about Komprise for cyber resiliency, including optimizing your defenses against cyber incidents, system failure and file data.

      What is File Data Ransomware?

      Ransomware is an attack by malware that holds your data files hostage by encrypting your systems and making your data inaccessible to you.  The majority of enterprise data in the enterprise is unstructured file data, which means organizations cannot afford to leave file data unprotected from ransomware. While the primary target for ransomware is file data, as the attacks grow more sophisticated hackers are seeking to defeat backups and snapshots.

      How to recover your ransomware encrypted data files

      The way to recover from a ransomware attack is to establish an immutable copy of your data in a separate location, ensuring it is separate from your data storage. Immutable storage can be physically “air gapped” with offline media such as tape or virtually air gapped with technologies such as AWS S3 object lock that prevent any modification of data even by administrators for a set retention period.

      How long does it take to recover from a ransomware attack?

      A critical component often overlooked is how long the ransomware recovery can take – if your business can’t resume until data is restored, every minute adds to the cost of the ransomware attack. Recovery from a ransomware attack is equivalent to a disaster where potentially 100% of your data must be restored. Having a tested recovery plan in place is essential to a successful recovery.

      How do you protect file data from ransomware?

      There are two components of ransomware protection: detection and recovery. Early detection of ransomware will deliver the best outcome, but this is not always foolproof and can be difficult. Organizations should also invest in data recovery strategies and create an immutable copy of data in a separate location data storage and backups in the event of a ransomware attack. But keeping multiple copies of data can get prohibitively expensive. To protect file data from ransomware, the solution must: – Be cost-effective – Protect if backups and snapshots are infected – Provide simple recovery without significant upfront investment – Be verifiable.

      Getting Started with Komprise:

    • File Data Tiering

      File data tiering is a data storage management technique that automatically moves files from one storage tier to another based on usage patterns and access frequency. The goal of file data tiering is to optimize storage utilization and reduce storage costs by placing frequently used files on high-performance storage and less frequently used files (cold data storage) on lower-performance storage.

      Hardware-based tiering, software-based tiering, and cloud-based tiering are three methods of file data tiering. Hardware-based tiering moves files between different types of physical storage devices, such as solid-state drives (SSDs) and hard disk drives (HDDs), within a storage array. Software-based tiering moves files between different types of virtual storage volumes, such as high-performance and low-performance storage pools. Cloud-based tiering moves files between different storage classes within a cloud-based object storage service, such as Amazon S3.

      As part of a broader file data management strategy, file data tiering can help organizations improve storage utilization, reduce storage costs, and increase storage performance by automatically placing the right data in the right place at the right time. However, it’s important for organizations to carefully consider their storage requirements and choose a file tiering solution that fits their needs, as not all tiering solutions are appropriate for all environments.

      File-Level Tiering vs Block-Level Tiering

      Learn the difference between storage-centric block tiering, which moves blocks that can no longer be directly accessed from their new location without vendor software (aka lock-in) and file data tiering, which is what Komprise uses to fully preserve file access at each tier by keeping the metadata and file attributes with the file—no matter where it lives. Know the difference to make the right cloud tiering choice for your data storage moves.block_file_tiering

      Getting Started with Komprise:

    • File Server

      A file server is the central server in a computer network that provides a central storage place for files on internal data media to connected clients.

      Getting Started with Komprise:

    • File Storage

      What is File Storage?

      File storage, or file-based storage, is the process of storing digital files, such as documents, photos, videos, and other types of data, primarily unstructured data, in a secure and accessible location. There are several options available for file storage, including:

      • Local storage: This involves storing files on a physical device, such as a hard drive, USB drive, or memory card. Local storage can provide a high level of control over the files, but there is a risk of data loss if the device fails or is lost or stolen.
      • Cloud storage: This involves storing files on remote servers that are accessed through the internet. Cloud storage providers offer varying levels of security, accessibility, and storage capacity, and can be a convenient and cost-effective option for storing and accessing files.
      • Network-attached storage (NAS): This is a type of storage device that is connected to a network and allows multiple users to store and access files. NAS devices can provide a high level of control and security over the files, but can be more complex and expensive to set up than other options.

      When choosing a file storage solution, it is important to consider factors such as security, accessibility, reliability, and cost. It may also be helpful to assess the specific needs of your organization or personal use case, such as the volume and type of files that need to be stored, and the number of users who will need to access them.

      File Storage Cost Savings

      File storage cost savings can be achieved by optimizing your storage strategy to reduce the amount of data that needs to be stored, and by leveraging cost-effective storage solutions.


      Here are some tips file data storage cost savings suggestions:

      • Know your data: Conduct an audit of your files to determine which files are necessary and which can be deleted or archived. By reducing the amount of data you need to store, you can save on storage costs. Learn more about Komprise Analysis.
      • Use compression: Compressing files can reduce their size, allowing you to store more files in the same amount of storage space. Many file types, such as images and videos, can be compressed without losing quality.
      • Leverage cloud storage: Cloud storage providers offer a range of options with varying levels of storage capacity and pricing. By choosing a provider that meets your needs, you can save on the cost of physical storage devices and maintenance.
      • Consider tiered storage: Use different types of storage for different types of files, such as high-performance storage for frequently accessed files and lower-performance storage for archival files. This can help you optimize storage costs while still ensuring accessibility and performance.
      • Implement data deduplication: Data deduplication is a process that eliminates redundant data, such as duplicate files or multiple versions of the same file. By reducing the amount of duplicate data, you can save on storage costs.

      File data is growing exponentially. Budgets are not. Reducing file storage costs, while gaining data value is a top enterprise IT priority.

      Read the white paper: Know your file tiering options: Storage-based vs. Gateways vs. File-based.

      Read the white paper: Block-level vs. File-level tiering.


      Getting Started with Komprise:

    • File-level Tiering

      File-level tiering is a standards-based data tiering approach Komprise uses that moves each file with all its metadata to the new tier, maintaining full file fidelity and attributes at each tier for direct data access from the target storage and no rehydration.

      Read the white paper: Block-Level Tiering versus File-Level Tiering.


      Getting Started with Komprise:

    • FinOps (or Cloud FinOps)

      FinOps (or Cloud FinOps) means financial operations that include practices such as cost optimization, cost allocation, chargeback and showback, and cloud financial governance. Some of the key challenges that organizations face with regards to cloud costs include:

      • Cost visibility: Many organizations struggle to gain complete visibility into their cloud costs, which can make it difficult to ensure that they are not overspending on resources.
      • Cost optimization: Organizations need to optimize their cloud costs by reducing waste, optimizing resource utilization, and ensuring that they are only paying for what they need.
      • Cost allocation: Organizations need to allocate their cloud costs so that they are charged in a way that accurately reflects the resources that they are consuming.
      • Cloud financial governance: Governance processes and controls can ensure that cloud spending is aligned with their overall business goals and objectives.

      Overall, FinOps is a critical aspect of modern cloud management, and is essential for organizations that want to effectively manage their cloud costs and ensure that they are maximizing value and ROI from their cloud investments.

      There are several vendors that specialize in FinOps solutions for cloud cost management and cloud cost optimization, but increasingly FinOps is built into other applications and technology platforms:

      • Apptio
      • CloudHealth by VMware
      • RightScale (acquired by Flexera)
      • CloudCheckr
      • Azure Cost Management + Billing by Microsoft
      • AWS Cost Explorer by Amazon Web Services
      • Cloudability
      • ParkMyCloud

      With the right Cloud FinOps strategy, organizations should focus on gaining the tools and expertise they need to manage their cloud costs and ensure that they are getting the most value from their cloud investments.

      FinOps and Unstructured Data Management

      How much does it cost to own your data?

      Cost modeling in Komprise helps IT teams enter their actual data storage costs to determine upfront new projected costs and benefits before spending money on storage. (Know First)

      Look at your current (and future) data storage platform(s). Does the company pay per GB (OPEX) or is it an owned technology (CAPEX)? For the latter, divide the current total amount of actual usable data by the cost to acquire the full system to attain cost/TB. For example, 1PB of physical storage may end up being just 500TB of actual usable capacity but only has 300TB of actual useable data on it. Use the 300TB because that is representative of today’s data ownership cost.

      Data ownership should also include the cost of data protection (data backup, disaster recovery, etc.). The FinOps capabilities in Komprise Intelligent Data Management allow you to compare on-premises versus cloud models or factor in cloud tiering or migrating to a new NAS platform.

      Komprise Cost Models

      According to GigaOm’s 2022 Data Migration Radar Report: Komprise has, “the best set of Financial Operations (FinOps) features to date.”

      Stop overspending on cloud storage: Know First. Move Smart. Take Control with the right FinOps for cloud data storage and data management strategy.

      Getting Started with Komprise:

    • Flash Storage

      Flash storage is storage media intended to electronically secure data, which can be electronically erased and reprogrammed. The other advantage is it responds faster than a traditional disc, increasing performance.

      With the increasing volume of stored unstructured data from the growth of mobility and Internet of Things (IoT), organizations are challenged with both storing data and the opportunities it brings. Disk drives can be too slow, due to the speed limitations. For stored data to have real value, businesses must be able to quickly access and process that data to extract actionable information.

      Flash storage has a number of advantages over alternative storage technologies
      • Greater performance. This leads to agility, innovation, and improved experience for the users accessing the data – delivering real insight to an organization
      • Reliability. With no moving parts, Flash has higher uptime due to no moving parts. A well-built all-flash array can last between 7-10 years.

      While Flash storage can offer a great improvement for organizations, it is still too expensive as a place to store all data. Flash storage has been about twenty times more expensive per gigabyte than spinning disk storage over the past seven years. Many enterprises are looking at a tiered model with high-performance flash for hot data and cheap, deep object or cloud storage for cold data.

      Getting Started with Komprise:

  • G
    • General Data Protection Regulation (GDPR)

      The General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679) is a regulation by the European Union that aims to strengthen and unify data protection for all individuals within the European Union (EU). It also addresses the export of personal data outside the EU.

      GDPR becomes enforceable from 25 May 2018. Businesses transacting with countries in the EU will have to comply with GDPR laws.

      The GDPR regulation applies to personal data collected by organizations including cloud providers and businesses.

      Article 17 of GDPR is often called the “Right to be Forgotten” or “Right to Erasure”. The full text of the article is found below.

      To comply with GDPR, you need to use an intelligent data management solution to identify data belonging to a particular user and confine it outside the visible namespace before deleting the data. This two-step deletion ensures there are no dangling references to the data from users and applications and enables an orderly deletion of data.

      Art. 17 GDPR Right to erasure (‘right to be forgotten’)

      1) The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay where one of the following grounds applies:

      1. the personal data are no longer necessary in relation to the purposes for which they were collected or otherwise processed; 2 the data subject withdraws consent on which the processing is based according to point (a) of Article 6(1), or point (a) of Article 9(2), and where there is no other legal ground for the processing;
      2. the data subject objects to the processing pursuant to Article 21(1) and there are no overriding legitimate grounds for the processing, or the data subject objects to the processing pursuant to Article 21(2);
        the personal data have been unlawfully processed;
      3. the personal data have to be erased for compliance with a legal obligation in Union or Member State law to which the controller is subject;
      4. the personal data have been collected in relation to the offer of information society services referred to in Article 8(1).

      2) Where the controller has made the personal data public and is obliged pursuant to paragraph 1 to erase the personal data, the controller, taking account of available technology and the cost of implementation, shall take reasonable steps, including technical measures, to inform controllers which are processing the personal data that the data subject has requested the erasure by such controllers of any links to, or copy or replication of, those personal data.

      3) Paragraphs 1 and 2 shall not apply to the extent that processing is necessary:

      1. for exercising the right of freedom of expression and information;
      2. for compliance with a legal obligation which requires processing by Union or Member State law to which the controller is subject or for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller;
      3. for reasons of public interest in the area of public health in accordance with points (h) and (i) of Article 9(2) as well as Article 9(3);
      4. for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing; or
      5. for the establishment, exercise or defense of legal claims.

      Getting Started with Komprise:

    • Generative AI

      Generative AI is a branch of artificial intelligence (AI) that focuses on creating models or systems capable of generating new content, such as images, text, music, or even video, that is original and realistic. Generative AI models learn patterns and structures from existing data and then use that knowledge to produce new, unique outputs.

      Generative Models

      Generative AI models are designed to learn and understand the underlying patterns in a given dataset and generate new samples that resemble the original data. These models aim to capture the distribution of the training data and generate outputs that are consistent with that distribution.

      Varieties of Generative Models

      There are several types of generative models, each with its own approach and architecture. Some common types include Generative Adversarial Networks (GANs), Variational Auto-encoders (VAEs), and autoregressive models like Recurrent Neural Networks (RNNs) and Transformers.

      • Generative Adversarial Networks (GANs) consist of two neural networks: a generator and a discriminator. The generator creates new samples, while the discriminator evaluates the generated samples and distinguishes them from real samples. The two networks are trained in competition with each other, with the goal of improving the quality of the generated outputs.
      • Variational Auto-encoders (VAEs) are generative models that learn the underlying distribution of the input data and generate new samples by sampling from that distribution. VAEs typically consist of an encoder that maps input data to a lower-dimensional latent space and a decoder that reconstructs the original input from the latent space.

      Applications of Generative AI

      Generative AI has seen a growing number of practical applications – from generating realistic images, synthesizing human-like speech, creating music, to generating natural language text, to enhancing and transforming existing content, and even to generating virtual environments for simulations and gaming.

      Challenges and Ethical Considerations

      Generative AI poses challenges and ethical considerations. Ensuring that generated outputs are diverse, realistic, and unbiased is a challenge that researchers and developers strive to address. There are concerns about potential misuse of generative AI, such as generating deepfake images or spreading disinformation.

      Video: The role of data management and governance in AI


      Generative AI Advancements and Research

      Generative AI technology innovation is moving very fast and is an active area of research and development. New architectures, techniques, and approaches are constantly being explored to improve the quality and diversity of generated outputs. Researchers are also working on methods to control the generation process and incorporate user preferences or constraints.

      Generative AI has gained significant attention and has found applications in various domains, including art, entertainment, design, and data augmentation. It offers exciting possibilities for creating new content and expanding the capabilities of AI systems beyond traditional problem-solving and pattern recognition tasks. ChatGPT and Google Bard are examples of Generative AI tools.

      Getting Started with Komprise:

    • Global File Index

      What is a Global File Index?

      Komprise Deep Analytics enables precise unstructured data management at enterprise scale, creating a Global File Index, which is a metadata catalog spanning petabytes of file and object data sources, to find specific data sets and then create a data management plan to systematically take action on your data set. Unstructured data ends up in multiple silos, so an index needs to be global across different data centers, storage, backup and cloud infrastructure.

      Once you connect Komprise to your file and object storage, your data is indexed and a Global File Index, which is a global metadata catalog across disparate file and object data, is created. You do not have to move the data anywhere; but you now have a single way to query and search across your file and object stores. Say you have some NetApp, some Isilon, some Windows servers, some Pure Storage at different sites and you have some cloud file storage on AWS, Azure, and Google. You get a single index via Komprise of all the data across all these environments and now you can search and find exactly the data you need with a single console and API.


      Benefits of the Global File Index

      • Users only move the data they need, with the ability to create queries on countless file attributes and tags such as: data related to a specific tag or project name, projects that are no longer active, file age, user/group ID’s, path, file type (aka JPEG) and specific extensions, data with unknown owners.
      • A global metadata catalog eliminates the manual effort of finding custom data sets and moving them separately from different storage silos since Komprise can create a virtual data set based on the query and systematically and continuously move data from multiple file and object silos to the target location.
      • Improves IT and business collaboration around data, as data owners/users can participate in data tiering. 

      Watch the TechKrunch session: Deep Analytics Actions with One Global File IndexTechKrunch-Nov10

      Search and Act on Unstructured Data Insights

      Deep Analytics Actions provides a systematic way to find specific file and object data across hybrid cloud storage silos and move just the right subset of unstructured data for new uses such as AI/ML and cloud analytics. This gives IT and storage departments the ability to drive closer connections with end users by liberating the nuggets of useful data from petabytes of files, so that new value and customer-facing benefits can be discovered.

      Smart Data Workflows take Deep Analytics Actions a step further by allowing IT users and/or storage admins to create automated workflows for all the steps required to find the right unstructured data across storage assets, tag and enrich the data and send it to external tools for analysis. This eliminates manual effort in unstructured data management and helps organizations speed time to value from cloud-native and other tools.


      Getting Started with Komprise:

    • Google Cloud Platform (GCP)

      What is Google Cloud Platform?

      Google Cloud Platform (GCP) is a suite of cloud computing services provided by Google. It offers a wide range of infrastructure and platform services, including computing, storage, networking, big data, machine learning, and security. Some of the key services offered by GCP include:

      • Compute Engine – Virtual Machines (VMs) that can be used to run applications and services.
      • App Engine – A platform for building and deploying web and mobile applications.
      • Kubernetes Engine – A managed service for deploying, scaling, and managing containerized applications.
      • Cloud Storage – A scalable and durable object storage service.
      • Cloud SQL – A managed relational database service.
      • BigQuery – A serverless, fully managed data warehouse for analytics.
      • Cloud Pub/Sub – A messaging and streaming service for real-time data processing.
      • Cloud AI Platform – A suite of machine learning services for building and deploying ML models.

      GCP is designed to be highly scalable, reliable, and secure, and it is used by many organizations for a wide range of use cases, from small startups to large enterprises.

      Komprise and Google Cloud

      Getting Started with Komprise:

  • H
    • Hierarchical Storage Management (HSM)

      software, also known as tiered storage, was designed for distributed server
      environments to automate the process of identifying cold data sets and automatically migrating them from primary disk to less expensive optical and tape storage devices. Going back to the era of the mainframe, HSM was also supposed to handle file recall requests automatically whenever a user clicked on a stub file.

      Unfortunately, these early HSM products (see Wikipedia for a history) suffered from a number of deficiencies such as:

      • They were custom designed for specific proprietary storage systems, which limited hardware choices and resulted in vendor lock-in.
      • Many required file server agents that required substantial memory and compute resources, and operated in the direct data path, impacting performance.
      • They used static stub files left in place of the moved data. These static stub files could be corrupted, deleted, and orphaned making it difficult if not impossible to locate the original source file.
      • The early HSM solutions did not scale well. As file counts increased, HSM performance deteriorated significantly since they were traditional database-driven architectures.
      • The solutions would disrupt storage s
        ystem performance, interrupting active usage.
      • File recalls could take a long time, especially if the requested file was stored on tape.

      So bad were these deficiencies, that HSM became a “bad word” amongst IT professionals. Many of those IT pros believed that the only viable way to manage storage was to just keep adding more capacity to the primary tier.


      As the data center landscape has changed, with organizations having a wide range of data storage options available. Flash memory devices have replaced high performance physical disk drives as Tier-1 storage. High performance and commodity physical hard disks now function as secondary and tertiary storage tiers. Cloud file storage and object storage options are available to handle large bulk, long-term storage requirements. All of these options are needed to combat the unstructured data onslaught (and data sprawl and high data storage costs) that most organizations are facing. However, the main problem remains; how to automatically detect “warm” and “cold” data sets then continuously migrate them to the most cost-effective storage tier while also managing the entire file life cycle. As outlined in this early review of Komprise:

      In short, we have more storage options than ever but less intelligence about how and when to move our increasing data to which storage platform.

      In a 2022 Blocks and Files review, Komprise Intelligent Data Management is referred to as an HSM or Information Lifecycle Management solution. The new category of software is now known as unstructured data management as well as the broader term: data services.

      Getting Started with Komprise:

    • High Performance Storage

      What is High Performance Storage?

      High performance storage is a type of storage management system designed for moving large files and large amounts of data around a network. High performance storage is especially valuable for moving around large amounts of complex data or unstructured data like large video files across the network.

      Used with both direct-connected and network-attached storage, high performance storage supports data transfer rates greater than one gigabyte per second and is designed for enterprises handling large quantities of data – in the petabyte range.


      High performance storage supports a variety of methods for accessing and creating data, including FTP, parallel FTP, VFS (Linux), as well as a robust client API with support for parallel I/O.

      High performance storage is useful to manage hot or active data, but can be very expensive for cold/inactive data. Since over 60 to 90% of data in an organization is typically inactive/cold within months of creation, this data should be moved off high performance storage to get the best TCO of storage without sacrificing performance.

      Is Cold Data Impacting Data Storage Performance?

      Unstructured data management policies ensures that data is always stored in the appropriate environment according to its usage, age, value and business priority to maximize data storage performance and data storage costs.

      Read: The Need for Policies to Corral Your Unstructured Data

      Getting Started with Komprise:

    • Hosted Data Management

      With hosted data management, a service provider administers IT services, including infrastructure, hardware, operating systems, and system software, as well as the equipment used to support operations, including data storage, hardware, servers, and networking components.

      The managed service provider (MSP) typically sets up and configures hardware, installs and configures software, provides support and software patches, maintenance, and monitoring.

      Services may also include disaster recovery, security, DDoS (distributed denial of service) mitigation, and more.

      Hosted data management may be provided on a dedicated or shared-service model. In dedicated hosting, the service provider sets aside servers and infrastructure for each client; in shared hosting, pooled resources and charged for on a per-use basis.

      Hosted data management can also be referred to as cloud services. With cloud hosting, resources are dispersed between and across multiple servers, so load spikes, downtime, and hardware dependencies are spread across multiple servers working together.

      In this arrangement, the client usually has administrative access through a Web-based interface.

      Another popular model is hybrid cloud hosted data management – where the administrative console resides in the cloud but all the data management (analyzing data, moving data, accessing data) is done on premise. Komprise Intelligent Data Management uses this hybrid approach as it offers the best of both worlds – a fully managed service that reduces operating costs without compromising the security of data.


      Getting Started with Komprise:

    • Hot Data

      Hot data is business-critical data that needs to be accessed frequently and resides on primary storage (NAS).

      Hot data is considered to be of high value and importance. This type of data is typically stored in fast memory, such as RAM, to ensure quick and efficient access. Examples of hot data include frequently used databases, in-memory caches, and real-time data streams.


      Getting Started with Komprise:

    • Hybrid Cloud Data Management

      Hybrid cloud data management is a broad term that can mean different things to different people and areas of the organization depending on the focus is unstructured data, data storage and data protection or data warehousing, data lakes, analytics and AI. Generally hybrid cloud data management refers to the technologies, processes and strategies used to effectively and efficiently manage data in a hybrid cloud environment. A hybrid cloud combines both on-premises IT infrastructure and cloud resources, allowing organizations to leverage the benefits of both environments.

      General areas of hybrid cloud data management

      • Data Integration: In a hybrid cloud setup, data may reside in various locations, including on-premises systems and multiple cloud providers. Data integration involves ensuring seamless connectivity and integration between these different data sources and applications. It may involve using technologies such as data integration platforms, APIs, or data virtualization to unify data access and enable data movement between on-premises and cloud environments. An example of a hybrid cloud data integration vendor is SnapLogic. Also see Cloud Data Management.
      • Data Governance: Data governance in a hybrid cloud environment focuses on defining policies, standards, and procedures for data management, ensuring compliance, data security, and privacy. It involves establishing data ownership, access controls, data classification, and data lifecycle management across both on-premises and cloud resources. Implementing consistent data governance practices helps organizations maintain data quality, security, and regulatory compliance across their hybrid cloud infrastructure.
      • Data Backup and Disaster Recovery: Hybrid cloud data management includes implementing backup and disaster recovery strategies to protect data in case of data loss, system failures, or natural disasters. It involves replicating and backing up critical data from on-premises infrastructure to cloud storage or using cloud-based backup services. By leveraging the scalability and reliability of cloud resources, organizations can ensure data availability and minimize downtime during unforeseen events. See the post: Begun the Cloud File Services Wars Have
      • Data Security and Privacy: Hybrid cloud environments require robust security measures to protect sensitive data. Data encryption, access controls, identity and access management (IAM), network security, and threat detection mechanisms should be implemented to safeguard data both in transit and at rest. Compliance with data protection regulations, such as GDPR (General Data Protection Regulation) or HIPAA (Health Insurance Portability and Accountability Act), should also be considered when managing data in hybrid cloud environments.
      • Data Analytics and Insights: Hybrid cloud data management enables organizations to leverage cloud-based analytics tools and platforms to gain valuable insights from their data. Data can be processed, analyzed, and visualized using cloud-native services, such as data lakes, data warehouses, or machine learning platforms. By utilizing cloud resources for data analytics, organizations can take advantage of scalability, agility, and cost-efficiency to derive meaningful insights from their hybrid data sources.

      In 2023, every enterprise IT organizations is working on establishing clear data management policies and evaluating their requirements in each of these areas, working closely with cloud service providers and leveraging their managed services where appropriate.

      Gartner published summarized the following 4 trends shaping the future of cloud, data center, edge IT infrastructure:

      • Trend 1: Cloud Teams Will Optimize and Refactor Cloud Infrastructure
      • Trend 2: New Application Architectures Will Demand New Kinds of Infrastructure
      • Trend 3: Data Center Teams Will Adopt Cloud Principles On-Premises

      “According to Gartner, 35% of data center infrastructure will be managed from a cloud-based control plane by 2027, from less than 10% in 2022. I&O professionals should focus this year on building cloud-native infrastructure within the data center; migrating workloads from owned facilities to co-location facilities or the edge; or embracing as-a-service models for physical infrastructure.”

      • Trend 4: Successful Organizations Will Make Skills Growth Their Highest Priority

      Getting Started with Komprise:

    • Hybrid Cloud Storage

      What is Hybrid Cloud Storage?

      As data moves from on-premises data centers to the public cloud and to edge computing devices, enterprise data storage has increasingly moved to a hybrid cloud storage model, where data is stored on the infrastructure that will leverage the processing power of the public cloud. In Gartner’s Hybrid Cloud Storage Market Guide (subscription required), they recommend that infrastructure and operations leaders identify the right workloads, types of data and use cases for cloud data storage and prioritize hybrid cloud storage solutions that support cloud-native access.

      In the 2021 Komprise Unstructured Data Management Survey, 50% of enterprises responded that they have data stored in a mix of on-premises and cloud-based storage and 56% stated that their top priority is cloud data migration.

      Download the State of Unstructured Data Management report. 


      In August 2022, Komprise published the 2nd annual State of Unstructured Data Management Report.

      Unstructured Data Management

      Getting Started with Komprise:

    • Hypertransfer

      Hypertransfer for Komprise Elastic Data Migration migrates file data to the cloud 25x faster.

      Announced in December 2022, Komprise Hypertransfer for Elastic Data Migration creates dedicated virtual channels across the WAN to accelerate cloud data migrations. By establishing dedicated channels to send data, Komprise Hypertransfer minimizes the WAN roundtrips, which mitigates SMB protocol chattiness and dramatically improves data transfer rates. Tests done using a dataset dominated by small files shows Komprise accelerates cloud data migration 25x faster than other alternatives.


      Read the Hypertransfer white paper.

      Getting Started with Komprise:

  • I
    • Immutable Storage

      What is immutable storage?

      Immutable storage is a feature of file storage, or more typically object storage, that protects data from modification or deletion for a set retention period. Immutable storage is often used in highly regulated industries such as finance and health care but is now gaining popularity across other industries as a defense against ransomware or insider threats.

      Implementations of immutable storage such as AWS S3 Object lock are certified by independent 3rd parties to ensure they comply with government regulations.

      Read the blog post: How to Protect File Data from Ransomware at 80% Lower Cost

      Komprise-Ransomware-blog-post-THUMB-1Since approximately 80% of data today is unstructured data, organizations cannot afford to leave file data unprotected from ransomware attacks. Early ransomware detection can deliver the best outcome, but as ransomware attacks are constantly evolving, detection is not always foolproof and can be difficult. Investing in ways to recover data if you do get attacked by ransomware is essential. An immutable copy of data in a separate location separate from your data storage and data backups gives you a way to recover data in the event of a potentially devastating ransomware attack. But keeping multiple copies of data can get prohibitively expensive.

      Getting Started with Komprise:

    • Information Lifecycle Management

      Information Lifecycle Management (ILM) is a data management strategy that focuses on managing the flow of data from creation to deletion. The goal of ILM is to optimize the use of storage resources and improve data management efficiency and cost-effectiveness.

      Gartner defines ILM this way:

      Information Lifecycle Management (ILM) is approach to data and storage management that recognizes that the value of information changes over time and that it must be managed accordingly. ILM seeks to classify data according to its business value and establish policies to migrate and store data on the appropriate storage tier and, ultimately, remove it altogether. ILM has evolved to include upfront initiatives like master data management and compliance.


      TechTarget Defines ILM this way:

      Information lifecycle management (ILM) is a comprehensive approach to managing an organization’s data and associated metadata, starting with its creation and acquisition through when it becomes obsolete and is deleted.


      ILM involves a series of activities that are performed at different stages of the data lifecycle, such as data creation, data storage, data protection, data archiving, and data deletion. At each stage, the data is managed and stored according to its value, importance, and frequency of use.

      ILM typically involves the use of data classification, data retention, data archiving policies, and data management tools and technologies. These policies and technologies help to manage the flow of data throughout its lifecycle and ensure that it is stored in the most appropriate location and format for its current needs.

      Benefits of implementing ILM

      Improved storage utilization and cost savings

      By managing data throughout its lifecycle, ILM helps organizations ensure that the most valuable and important data is stored on high-performance storage systems, while less important data is stored on lower-cost storage systems.

      Increased data protection and security

      By managing the flow of data and applying appropriate data protection and security measures, ILM helps reduce the risk of data loss or corruption.

      Better compliance

      ILM helps organizations meet regulatory and compliance requirements by ensuring that data is managed and stored in accordance with the organization’s policies and best practices.

      Overall, Information Lifecycle Management is an essential aspect of modern data management and is critical to effectively manage and store data securely and with cost savings in mind.

      ILM Challenges

      • Complexity: In organizations with large and complex data environments it can be difficult to effectively manage and store data throughout its lifecycle. This can lead to data sprawl, increased data storage costs, and increased security and compliance risks.
      • Cost: Implementing ILM requires investment in the right data management tools and technologies, and structured and unstructured data management policies and processes. This can be a significant cost for organizations, especially those with limited budgets.
      • Data protection and security: ILM can introduce new security and privacy risks, especially if sensitive data is stored on low-cost or low-security storage systems. Organizations should ensure that they have appropriate data protection and security measures in place to mitigate these risks.

      By carefully planning and executing your ILM strategies, organizations can manage and store your data throughout its lifecycle, cutting costs while ensuring that data is protected, secure, and compliant with regulatory requirements.

      On-going Unstructured Data Management as part of an ILM Strategy

      As we noted when we launched Smart Data Workflows, with billions of files and objects, analytics plus continuous mobilization is essential because data has a lifecycle and data management is not a one-time thing. Whether the use case is data analytics, data migration, data tiering, data replication, data search or anything related to the data lifecycle, it is important to look for an unstructured data management solution that delivers on-going data management. Learn more about Komprise Intelligent Data Management.

      Getting Started with Komprise:

    • Intelligent Data Management

      Intelligent Data Management is the process of managing unstructured data throughout its lifecycle with analytics and intelligence. It is also the name of the Komprise platform as a service: Intelligent Data Management.

      The criteria for a solution to be considered as Intelligent Data Management includes:

      Analytics-Driven Data Management

      Is the solution able to leverage analysis of the data to inform its behavior? Is it able to deliver analysis of the data to guide the data management planning and policies? Learn more about Komprise Analysis.

      Storage-Agnostic Data Management

      Is the data management solution able to work across different vendor and different storage platforms?

      Adaptive Data Management

      Based on the network, storage, usage, and other conditions, is the data management solution able to intelligently adapt its behavior? For instance, does it throttle back when the load gets higher, does it move bigger files first, does it recognize when metadata does not translate properly across environments, does it retry when the network fails?

      Closed Loop Unstructured Data Management

      Analytics feeds the data management which in turn provides additional analytics. A closed loop system is a self-learning system that uses machine learning techniques to learn and adapt progressively in an environment.

      Efficient and Cost Effective Data Management

      An intelligent data management solution should be able to scale out efficiently to handle the load, and to be resilient and fault tolerant to errors.It should also ensure you’re able to achieve data storage cost savings.

      Komprise-Architecture-Page-SOCIALIntelligent data management solutions typically address the following use cases:

      • Analysis: Find the what, who, when of how data is growing and being used
      • Planning: Understand the impact of different policies on costs, and on data footprint
      • Data Tiering or Data Archiving: Support various forms of managing cold data and offloading it from primary storage and backups without impacting user access. Includes: Tier and archive data by policy – move data with links for seamless access, Archive project data – archive data that belongs to a project as a collection, Archive without links – move data without leaving a link behind when data needs to be moved out of an environment
      • Data Replication: Create a copy of data on another location.
      • Data Migration: Move data from one storage environment to another
      • Deep Analytics: Search and query data at scale across storage

      Getting Started with Komprise:

    • IOPS

      IOPS stands for Input/Output Operations Per Second. It is a commonly used metric to measure the performance or throughput of storage devices, such as hard disk drives (HDDs), solid-state drives (SSDs), or data storage systems.

      IOPS represents the number of read and write operations a storage device or system can perform in one second. It is an important metric for determining the responsiveness and efficiency of storage solutions, especially in high-performance or latency-sensitive environments.

      The IOPS value can vary significantly depending on factors such as the storage technology, disk capacity, disk speed, queue depth, block size, and workload characteristics.

      Key points about IOPS:

      • Random IOPS: Random IOPS refers to the number of random read or write operations a storage device can handle per second. It is a measure of how quickly the storage device can handle small, random data access patterns typically seen in databases or virtualized environments.
      • Sequential IOPS: Sequential IOPS represents the number of sequential read or write operations a storage device can perform per second. It measures the storage device’s ability to handle large, sequential data access patterns, which are common in tasks such as streaming or large file transfers.
      • Queue Depth: The queue depth represents the number of I/O requests that can be queued or outstanding at a given time. A higher queue depth allows for more simultaneous I/O operations, which can increase IOPS performance.
      • Block Size: The block size refers to the size of the data transferred in each I/O operation. Smaller block sizes typically result in higher IOPS values, as more operations can be performed in a given time period. However, larger block sizes can improve throughput and efficiency for certain workloads.

      IOPS is just one metric to consider when evaluating storage performance. Other factors like latency, bandwidth, and throughput also play a significant role. Workload characteristics, including read-to-write ratios, access patterns, and the number of concurrent users or applications, should be taken into account to determine the appropriate storage solution for specific use cases.

      When comparing storage devices or systems, it is recommended to consider multiple performance metrics, including IOPS, to gain a comprehensive understanding of their capabilities and suitability for a given workload.

      Historically, hardware-oriented metrics was how data storage was measured, including:

      • Latency, IOPS and network throughput
      • Uptime and downtime per year
      • RTO: Recovery point objective (time-based measurement of the maximum amount of data loss that is tolerable to an organization)
      • RPO: Recovery time objective (time to restore services after downtime)
      • Backup window: Average time to perform a backup

      Read more about the file metrics that matter.

      What are the top reports or metrics that data storage people need today to help keep up with these trends? Read: The Critical Role of Reporting in Trimming Storage Costs.

      Getting Started with Komprise:

    • Isilon CloudPools (Dell EMC)

      What are Isilon CloudPools?


      Dell EMC PowerScale (formerly Isilon) CloudPools software provides policy-based automated tiering that allows for an additional storage tier for the Isilon cluster at your data center. This technology is a form of storage pools which are collections of storage volumes that often blend different tiers of storage into a logical pool or shared storage environment.

      CloudPools supports tiering data from Dell PowerScale Isilon to public, private or hybrid cloud options. This technology moves archived files to the destination storage in a proprietary format and then references the moved files via stubs. File data access from the object storage is not possible, eliminating the use of cloud-based functions such as AI/ML. Functions such as backup by external application or migration to new storage array require full rehydration of data leading to egress fees from cloud storage and the need to retain on-prem storage capacity.

      Learn more about CloudPools.

      Read the blog post: What you need to know before jumping into the cloud tiering pool


      Read the white paper: Cloud Tiering: Storage-Based vs Gateways vs File-Based: Which is Better and Why?

      Learn how to save on storage with Dell EMC and Komprise.

      Getting Started with Komprise:

    • Isilon Tiering
      The Isilon Tiering solution from Dell EMC is called PowerScale CloudPools.
      Dell EMC PowerScale Isilon CloudPools software provides policy-based automated tiering that allows for an additional storage tier for the Isilon cluster at your data center. CloudPools supports tiering data from Dell PowerScale Isilon to public, private or hybrid cloud options. This technology is a form of storage pools, which are collections of storage volumes exported to a shared storage environment.
      Cloud tiering and data tiering (or archiving) can deliver significant cost savings as part of a cloud data strategy by offloading unused cold data to more cost-efficient cloud storage solutions. The approach you take to Isilon tiering can either create an easy path to the cloud with native access and full use of data in the cloud or it can create costly cloud egress and lock-in. Array block-level tiering is a mismatch for the cloud. Isilon cloud tiers blocks rather than entire files, which the following ramifications:
      • Limited policies result in more data access from the cloud.
      • Defragmentation of blocks leads to higher cloud costs.
      • Sequential reads lead to higher cloud costs and lower performance.
      • Tiering blocks impacts performance of the storage array.

      Read the blog post: What you need to know before jumping into the cloud tiering pool

      PowerScale Isilon Tiering Choices

      When it comes to considering PowerScale Isilon data tiering and PowerScale Isilon cloud tiering, it’s important to understand your cloud tiering choices. Cloud tiering and archiving can save you millions by offloading infrequently accessed cold data to cost-efficient cloud data storage. But, the approach you take can either create an easy path to the cloud for file data with full use of data in the cloud or it can create costly cloud egress and lock-in.

      Smart Migration from PowerScale Isilon with Komprise: Analyze your data first, tier off cold data, deliver 25x faster cloud data migrations and deliver transparency / no disruption to your users and native data access / no storage-vendor lock-in for your file and object data.

      Learn more about cloud tiering and your cloud tiering choices.
      Learn more about Komprise for Dell EMC.

      Getting Started with Komprise:

  • K
    • Komprise Analysis

      Komprise Analysis provides strategic insights into unstructured file and object data across your on-premises and cloud
      enterprise IT infrastructure:

      • komprise-analysis-overview-white-paper-THUMB-3Analyze across all your NAS, NFS, SMB, dual shares, as well as cloud storage.
      • See how much data you have, how fast it is growing, what is hot/cold.
      • Quickly understand file data types, top users, top groups, top directories.
      • Perform cost/benefit modeling and capacity planning for tiering and data management.

      With Komprise Analysis, you quihttps://www.komprise.com/resource/komprise-analysis-overview/ckly gain visibility across storage silos and the cloud to make data-driven decisions. Plan what to migrate, what to tier, and understand the financial impact with an analytics-driven approach to data management and mobility. What if you could significantly reduce your data costs by transparently moving/tiering infrequently used data to less expensive storage? What if you could tier data without disrupting users or applications and feed select data to AI and ML analysis tools to help generate revenue? With Komprise you can know first, move smart, extract value and take control of your unstructured data growth and costs. That’s the power of Intelligent Data Management.

      Komprise Analysis is available as a standalone SaaS solution included with Komprise Elastic Data Migration and the full Komprise Intelligent Data Management Platform.


      Getting Started with Komprise:

    • Komprise Deep Analytics

      Komprise Deep Analytics delivers granular, flexible search and indexes data in-place across file, object and cloud data storage to build a comprehensive Global File Index (GFI) spanning petabytes of unstructured data.

      Komprise Deep Analytics Actions: Add Deep Analytics queries to a plan and operationalize your ability to search and find what you need and when you need it.

      Smart Data Workflows: Leverage the GFI metadata catalog for systematic, policy-driven data management actions that can feed your data pipelines.


      Getting Started with Komprise:

    • Komprise Elastic Data Migration

      Komprise Elastic Data Migration is a SaaS solution available with the Komprise Intelligent Data Management platform or standalone. Designed to be fast, easy and reliable with elastic scale-out parallelism and an analytics-driven approach, it is the market leader in file and object data migrations, routinely migrating petabytes of data (SMB, NFS, Dual) for customers in many complex scenarios. Komprise Elastic Data Migration ensures data integrity is fully preserved by propagating access control and maintaining file-level data integrity checks such as SHA-1 and MD5
      checks with audit logging.

      In 2022, Komprise introduced Hypertransfer, which creates dedicated virtual channels across the WAN to accelerate cloud data migrations. By establishing dedicated channels to send data, Komprise Hypertransfer minimizes the WAN roundtrips, which mitigates SMB protocol chattiness and dramatically improves data transfer rates. Tests done using a dataset dominated by small files shows Komprise accelerates cloud file migration 25x faster.


      As outlined in the white paper How To Accelerate NAS and Cloud Data Migrations, Komprise Elastic Data Migration is a highly parallelized, multi-processing, multi-threaded approach that improves performance at many levels.


      Getting Started with Komprise:

    • Komprise Intelligent Data Management

      Komprise-Stats-Graphic-OverviewKomprise Intelligent Data Management is the full platform suite from Komprise, which delivers instant insight into data across NAS and Object data storage silos—from on-prem to the edge and across multi-cloud data storage. Identify savings, systemically move cold data transparently without any disruption to optimize costs, get an easier, faster path to the cloud and deliver greater unstructured data value. With Komprise Intelligent Data Management as a service you can analyze, migrate, tier, archive, replicate, and manage data at scale simply and reliably.

      Read the solution brief: Why Komprise Intelligent Data Management.

      Komprise Intelligent Data Management includes Komprise Analysis, Elastic Data Migration and Deep Analytics. Many enterprise organizations start with Komprise Analysis to know first and then determine the right data mobility and ongoing data management strategy.


      Getting Started with Komprise:

  • L
    • Large Language Model (LLM)

      A large language model (LLM) is a type of artificial intelligence model that is designed to understand and generate human-like language on a large scale. LLMs are typically trained on massive amounts of text data from a wide range of sources, such as books, websites, articles, and other textual resources. These models utilize deep learning techniques, particularly using architectures like transformers, to capture complex patterns and dependencies in language. Read the article: What are LLMs, and how are they used in generative AI?

      LLMs are trained to process and generate coherent and contextually relevant responses based on the input they receive. They can understand and generate text in multiple languages and can perform a variety of language-related tasks, including language translation, text summarization, question answering, text completion, and more.

      One of the most well-known and influential LLMs is OpenAI’s ChatGPT or just GPT (Generative Pre-trained Transformer) series, such as GPT-4. These models have achieved capabilities in generating human-like text and have been used in various applications, including chatbots, virtual assistants, content generation, and creative writing.

      The training process for LLMs involves exposing the model to a large corpus of text data and using techniques like unsupervised learning to learn the statistical patterns and relationships within the language. The models are trained to predict the next word or sequence of words based on the context provided by the preceding words. This process enables the models to capture syntactic, semantic, and contextual nuances of language.

      LLMs limitations and challenges

      LLMs can sometimes generate incorrect or nonsensical responses, struggle with understanding nuances, and may be sensitive to biases present in the training data. Additionally, LLMs require significant computational resources for training and inference, making them computationally expensive. Despite these limitations, large language models have demonstrated tremendous potential in advancing natural language processing capabilities and enabling human-like interactions with AI systems. Ongoing research and development efforts continue to push the boundaries of LLMs, aiming to improve their accuracy, interpretability, and ethical use.

      Data Management and AI

      In May 2023, Krishna Subramanian, cofounder and COO of Komprise wrote an article for Datanami entitled: Data Management Implications for Generative AI. She summarized 3 areas that need more attention:

      1. Data governance and transparency with training data
      2. Data segregation and data domains
      3. The derivate works of AI

      Her conclusion:

      Enterprises should tread carefully and ensure they clearly understand the data exposure, data leakage and potential data security risks before using AI applications.

      Getting Started with Komprise:

    • Logs

      Logs refer to records or entries that capture events, activities, or messages generated by software applications, operating systems, servers, or network devices. Logs provide a chronological and detailed account of various system activities, which can be helpful for troubleshooting, analysis, auditing, security monitoring, and performance optimization.

      Log Types

      Logs can vary in their format and content depending on the system or application generating them. Common types of logs include system logs, application logs, security logs, event logs, error logs, access logs, audit logs, and debug logs.

      Logging Frameworks

      Software applications and systems often employ logging frameworks or libraries to generate logs systematically. These frameworks provide APIs or functions that developers can use to log specific events or messages at different levels of severity, such as debug, info, warning, error, or critical.

      Log Entries

      Each log entry typically contains specific information, including timestamps, log levels, event descriptions, error codes, source IP addresses, user actions, system configurations, stack traces, or any other relevant data related to the event being logged.

      Log Analysis and Monitoring

      Logs are frequently collected and stored centrally, making them accessible for analysis and monitoring. Log analysis involves parsing, filtering, aggregating, and correlating log data to identify patterns, anomalies, errors, or security incidents. Log monitoring involves real-time tracking and alerting based on predefined conditions or thresholds. There are many monitoring and observability vendors who focus on log analysis and monitoring.

      Log Retention and Archiving

      Organizations often establish log retention policies to determine how long logs should be retained for compliance, auditing, or forensic purposes. Logs may be archived or backed up periodically to ensure their long-term availability and integrity.

      Log Management Systems

      To effectively handle and analyze large volumes of logs, organizations may employ log management systems or log analytics platforms. These tools automate log collection, storage, indexing, search, visualization, and analysis, enabling efficient log management and insights.

      Security and Compliance

      Logs play a crucial role in security and compliance efforts. They can provide valuable information for detecting and investigating security incidents, tracking user activities, identifying vulnerabilities, and meeting regulatory requirements.

      Log Rotation

      To manage log file sizes and prevent excessive storage usage, log rotation is often implemented. Log rotation involves periodically renaming or compressing log files and starting new log files to ensure continuous logging without overwhelming storage resources.

      Logs are an essential component of system and application management, offering valuable insights into the operation, performance, security, and troubleshooting of computing environments. Effective log management practices and log analysis techniques can help organizations maintain the health, security, and reliability of their systems.

      Unstructured data management and logs: Read the whitepaper: Komprise Architecture Overview.

      Getting Started with Komprise:

  • M
    • Metadata

      Metadata means “data about data” or data that describes other data. The prefix “meta” typically means “an underlying definition or description” in technology circles

      Metadata makes finding and working with data easier – allowing the user to sort or locate specific documents. Some examples of basic metadata are author, date created, date modified, and file size. Metadata is also used for unstructured data such as images, video, web pages, spreadsheets, etc.

      Web pages often include metadata in the form of meta tags. Description and keywords meta tags are commonly used to describe content within a web page. Search engines can use this data to help understand the content within a page.

      Metadata can be created manually or through automation. Accuracy is increased using manual creation as it allows the user to input relevant information. Automated metadata creation can be more elementary, usually only displaying basic information such as file size, file extension, when the file was created, for example.

      Metadata can be stored and managed in a database, however, without context, it may be impossible to identify metadata just by looking at it. Metadata is useful in managing unstructured data since it provides a common framework to identify and classify a variety of data including videos, audios, genomics data, seismic data, user data, documents, logs.

      Learn more about the Komprise Global File Index and Deep Analytics.

      Learn more about the Komprise Intelligent Data Management architecture.

      What is Metadata?

      Metadata is “data about data.” It is structured data that references and identifies data to give an essential extra layer of shorthand information. Metadata schema can be simple or complex but it provides an important underlying definition or description.

      Types of Metadata

      There are three main types of metadata:

      Structural Metadata – examples include:

      • Page Numbers
      • Sections
      • Chapters
      • Indexes
      • Tables of Contents

      Administrative Metadata – examples include:

      • Technical Metadata – Decoding and rendering files information
      • Preservation Metadata – Information necessary for the long-term management and archiving of digital assets
      • Rights Metadata – Information relating to intellectual property and usage rights

      Descriptive Metadata – examples include:

      • Unique identifiers (eg ISBN)
      • Physical attributes (eg file dimensions or Pantone colors)
      • Bibliographic attributes (eg author or creator, title, and keywords)
      Metadata Management

      Metadata management is the administration of data that describes other data. To manage metadata effectively there must be established policies.
      Metadata management is important for understanding, aggregating, grouping and sorting data for use. Over the last decade, the rapid growth of data has created the need for metadata management to provide a clear insight into what data to produce and what data to consume. This ensures data becomes a valuable enterprise asset.

      Learn more about Komprise Smart Data Workflows

      Learn more about Komprise Deep Analytics and the metadata-driven Komprise Global File Index

      Getting Started with Komprise:

    • Metadata Management

      Metadata management is the process of collecting, organizing, storing, and maintaining metadata associated with an organization’s data assets. Metadata means data about data – it provides context, structure, and information about various aspects of data, making it easier to understand, manage, and use. Effective metadata management is essential for ensuring data quality, data accuracy, and the right data accessibility across an organization’s enterprise data landscape.

      All-About-Metadata-Blog_-Linkedin-Social-1200px-x-628pxTypes of Metadata:

      • Descriptive Metadata: Provides information about the content, structure, and context of data. This includes attributes such as data source, creation date, author, format, and keywords.
      • Technical Metadata: Contains technical details about data, such as data type, data length, field names, and relationships between data elements.
      • Operational Metadata: Tracks the usage and behavior of data within systems, including information about data transformations, processes, and workflows.
      • Business Metadata: Relates data to the business context, such as data definitions, business rules, data ownership, and data lineage.

      Benefits of the Metadata Management Strategy:

      • Data Discovery and Understanding: Metadata provides insights into the meaning and structure of data, making it easier for users to discover and understand available data assets.
      • Data Governance: Metadata management supports data governance initiatives by enabling organizations to define and enforce data quality standards, security policies, and compliance requirements.
      • Data Lineage: Understanding the lineage of data – its origin, transformations, and movement – helps ensure data accuracy and traceability, particularly in complex data environments.
      • Data Integration: Metadata helps integration processes by clarifying how different data sources relate to each other, reducing the complexity of integrating disparate data systems.
      • Data Analytics and Reporting: Accurate metadata supports effective data analysis and reporting by providing the necessary context for interpreting results.
      • Search and Discovery: Well-managed metadata enables efficient search and discovery of data, saving time and effort when finding relevant information.
      • Collaboration: Metadata fosters collaboration by providing a common understanding of data across teams and departments.
      • Data Migration and Data Archiving: During data migration or data archiving projects, metadata helps in identifying what data to move, how to transform it, and what to retain for compliance purposes.

      Metadata Management Process:

      This can be done different across enterprises and industries, but the general components are:

      • Capture: Metadata is collected from various sources, including databases, applications, files, and user input.
      • Store: Metadata can be stored in a centralized metadata repository or catalog. This repository acts as a single source of truth for all metadata assets.
      • Organize: Metadata is organized into categories, taxonomies, or hierarchies to facilitate easy navigation and understanding.
      • Govern: Metadata is governed through established processes, ensuring data quality, accuracy, security, and compliance.
      • Search and Access: Users can search and access metadata using intuitive tools and interfaces, allowing them to find relevant data assets quickly.
      • Update and Maintain: Regularly update and maintain metadata as data assets evolve over time. This includes updating technical details, documenting changes, and managing data lineage.

      Metadata Standards and Tools:

      Metadata management often involves using standards such as Dublin Core, Metadata Object Description Schema (MODS), and industry-specific standards. Various metadata management tools and platforms are available to facilitate the capture, storage, organization, and retrieval of metadata. Metadata management is a crucial practice for any organization that values data quality, accessibility, and effective data governance. It has now broadened to include unstructured data in order to provide the context necessary to understand and utilize all data assets while supporting critical business initiatives, compliance efforts, analytical and AI activities.

      Getting Started with Komprise:

    • Microsoft Azure


      What is Microsoft Azure?

      Microsoft Azure is a cloud computing platform and set of on-demand services provided by Microsoft, including virtual machines, cloud storage, databases, cloud analytics, and more. From the Microsoft website:

      The Azure cloud platform is more than 200 products and cloud services designed to help you bring new solutions to life—to solve today’s challenges and create the future. Build, run, and manage applications across multiple clouds, on-premises, and at the edge, with the tools and frameworks of your choice.

      Azure supports a variety of programming languages, tools, and frameworks, including Microsoft-specific and third-party software and systems. It is designed to be flexible and scalable, allowing users to pay only for the services they use and adjust their resources as needed.

      With trends like digital transformation, data center consolidation and DevOps adoption in the enterprise, a growing number of organizations are planning to (or in the midst of) migrate on-premises applications and infrastructure to the cloud platforms like Microsoft Azure to create and deploy web applications and APIs and store and analyze data. Cloud service provider platforms like Microsoft Azure are also popular among developers who use it to build and test applications, collaborate with other developers, and deploy code to the cloud.

      Azure Storage

      Azure Storage is a cloud-based storage service provided by Microsoft. It is designed to provide scalable and highly-available storage for data, files, and unstructured data, such as images, videos, and documents.

      Azure Storage provides four types of storage services:

      • Blob Storage: Blob storage is designed to store large unstructured data such as documents, images, videos, and logs.
      • File Storage: File storage is a fully managed file share service that enables customers to migrate their on-premises file shares to the cloud, but requires the right path to the cloud.
      • Queue Storage: Queue storage provides messaging capabilities between different components of a distributed application.
      • Table Storage: Table storage is a NoSQL key-value store that can be used to store massive amounts of structured data.

      Azure Storage provides features such as automatic replication, backup and restore, disaster recovery, and access control to ensure the security and availability of data. It can be accessed through REST APIs or SDKs for various programming languages. Azure Storage also integrates with other Azure services, such as Azure Virtual Machines, Azure Web Apps, and Azure Functions.


      Learn more about Komprise for Microsoft Azure.

      Getting Started with Komprise:

  • N
    • NAS Software

      NAS stands for Network-Attached Storage, which is a type of data storage architecture that allows multiple devices to access shared storage over a network. NAS software is the software that powers these NAS systems. There are several NAS software options available, from FreeNAS, an open-source NAS software that supports various protocols and features, including CIFS/SMB, NFS, iSCSI, FTP, etc., to enterprise NAS vendors who deliver a combination of NAS software and NAS hardware (Pure Storage, NetApp, HPE, Dell are examples).

      The choice of NAS software depends on factors such as the size of your storage needs, budget, features required and personal preferences.

      NAS Hardware

      NAS hardware is the physical components that make up a Network-Attached Storage (NAS) system. Some of the key components of NAS hardware include:

      • Storage drives: The most important component of any NAS system is the storage drives. These are the hard drives or solid-state drives (SSDs) that store the data. NAS systems typically use multiple drives in a RAID configuration to provide redundancy and improved performance.
      • NAS enclosure: The enclosure is the physical housing that holds the storage drives and other components of the NAS system. Enclosures can vary in size, from small desktop models to large rack-mounted models for enterprise environments.
      • Network interface: The network interface is the component that allows the NAS system to connect to a network. Most NAS systems have a built-in network interface card (NIC) that supports Ethernet connections.
      • Processor and memory: The processor and memory are important components that affect the performance of the NAS system. A powerful processor and sufficient memory can improve the speed and responsiveness of the NAS system.
      • Power supply: The power supply is responsible for providing power to the NAS system. It is important to choose a reliable power supply to ensure that the NAS system operates smoothly.
      • Cooling system: NAS systems generate a lot of heat due to the high-speed operation of the storage drives and other components. A good cooling system is important to prevent overheating and damage to the components.
      • Expansion slots: Some NAS systems have expansion slots that allow you to add additional components, such as network interface cards, to improve the functionality of the system.

      Read: Sustainable data management and the future of green business

      Enterprise NAS solutions

      Pure Storage, NetApp, Dell and Qumulo are all companies that offer enterprise NAS solutions.

      • Pure Storage: Pure Storage offers FlashBlade, a high-performance, scalable NAS solution designed for modern workloads such as analytics, AI, and machine learning. FlashBlade is built on a software-defined architecture and provides features such as data reduction, encryption, and file replication.
      • NetApp: NetApp offers several NAS solutions, including the FAS series and the AFF series. The FAS series provides midrange NAS capabilities and is suitable for small and medium-sized businesses. The AFF series provides high-performance NAS capabilities and is suitable for large enterprises.
      • Dell: Dell offers several NAS solutions, including the PowerVault NX series and the PowerScale series. The PowerVault NX series provides midrange NAS capabilities and is suitable for small and medium-sized businesses. The PowerScale series provides high-performance NAS capabilities and is suitable for large enterprises.
      • Qumulo: Qumulo offers a software-defined NAS solution that can be deployed on-premises, in the cloud, or in a hybrid environment. The solution is designed to provide high-performance file storage for a range of workloads, including video and audio content, medical imaging, and scientific research data.

      These are just a few examples of enterprise NAS solutions.

      Cloud NAS

      Cloud NAS is a type of network-attached storage architecture that allows users to access their data remotely over the internet. There are several cloud NAS vendors in the market that offer cloud-based storage solutions. Some of the well-known cloud NAS vendors include:

      • Amazon Web Services (AWS): Amazon’s cloud computing platform provides several storage services, including Amazon Elastic File System (EFS), which is a cloud-based NAS solution that provides scalable and secure file storage for EC2 instances.
      • Microsoft Azure: Microsoft’s cloud computing platform provides Azure File Storage, which is a fully managed cloud-based NAS solution that supports SMB and NFS protocols.
      • Google Cloud Platform: Google’s cloud computing platform provides Cloud Filestore, which is a cloud-based NAS solution that provides high-performance file storage for compute instances running on Google Cloud Platform.

      NAS Migration

      NAS migration is the process of transferring data from one NAS system to another. This may be necessary if you are upgrading your existing NAS system, or if you are moving your data to a new location. Also refer to Cloud NAS Migration. Here are the steps involved in a typical NAS migration:

      • five-industry-data-migration-use-cases-blog-SOCIAL-1-768x402Plan the migration: The first step is to plan the migration. This involves identifying the data that needs to be migrated, estimating the size of the data, and choosing the new NAS system.
      • Set up the new NAS system: Once you have chosen the new NAS system, you need to set it up. This involves configuring the network settings, creating shares and volumes, and setting up user accounts and permissions.
      • Copy the data: The next step is to copy the data from the old NAS system to the new one. This can be done using various methods such as using a backup and restore process, using a file transfer protocol such as FTP, or using a third-party tool.
      • Verify the data: After the data has been copied, it is important to verify that all the data has been transferred successfully. This involves checking that all the files and folders have been copied correctly and that there are no missing or corrupted files.
      • Update the clients: Finally, you need to update the clients to point to the new NAS system. This involves updating the client configurations and testing to ensure that the clients can access the data on the new NAS system.

      It is important to ensure that you have a backup of all your data before you start the migration process. This will help you to recover your data in case anything goes wrong during the migration process.


      NAS Migration Challenges

      NAS migration can be a complex process and may present a number of challenges. Here are some of the common challenges that organizations may face during NAS migration:

      • Data transfer speed: Moving large amounts of data can be time-consuming, especially if you are using a slow network or if the data is being transferred over a long distance. This can result in prolonged downtime and potential data loss if the migration is not completed within the scheduled downtime window.
      • Compatibility issues: Different NAS systems may have different file systems, protocols, and configurations, which can create compatibility issues during the migration process. This can lead to data corruption or loss, or it may require additional configuration changes to ensure that the data is compatible with the new NAS system.
      • Data loss: Data loss is a common risk during any data migration process, and it is important to have a backup of all your data before you start the migration process. This will help you to recover your data in case anything goes wrong during the migration process.
      • User access: During the migration process, users may lose access to their data, which can result in productivity loss and potential data loss. It is important to plan for user access and ensure that users are informed about any scheduled downtime or access restrictions.
      • Data security: During the migration process, data may be exposed to security risks, such as unauthorized access or data breaches. It is important to ensure that your data is protected throughout the migration process.

      To overcome these challenges, it is important to plan the NAS migration process carefully, use appropriate migration tools and services, and involve all stakeholders in the process. It is also important to test the migration process thoroughly before the actual migration to identify and resolve any issues beforehand.


      Komprise for NAS Migration and Data Management

      Komprise specializes in analyzing and tiering, archiving and moving unstructured data from primary NAS to more cost-effective long-term storage without any disruption. Typically, 60% to 80% of enterprise file and object data has not been accessed in over a year. By tiering cold data and older log files and snapshots, the capacity of the storage array, mirrored storage array (if mirroring and/or replication being used) and backup storage is reduced dramatically. The right approach to transparently tiering cold data can reduce overall storage costs by as much as 70%.

      With Komprise you can migrate NAS and object data on-premises and in the cloud quickly, reliably, and at scale. Optimize cloud data storage costs with analytics-driven cloud tiering and archival. Build a Global File Index to easily find, tag and take action on the right data at the right time and feed the right data to analytics and AL / ML engines. Komprise uses open standards such as NFS, SMB / CIFS and REST/S3, making it “data storage agnostic.”

      Getting Started with Komprise:

    • Native Data Access

      Native Data Access: Having direct access to tiered or archived data without needing rehydration because files are accessed as objects from the target storage.

      The Benefits of Cloud Native Data Access

      Gartner estimates that by 2025 more than 95% of new digital workloads will be deployed on cloud-native platforms, up from 30% in 2021. According to the 2022 State of Unstructured Data Management report, enterprise IT organizations are looking to optimize data storage efficiency by moving more data to the cloud. As a result cloud NAS file data storage options are attracting attention. In fact, cloud NAS topped the list for storage investments in the 2023 (47%), followed closely by cloud object storage (44%). Enterprise data storage vendors such as NetApp have popular cloud NAS offerings alongside cloud-native offerings such as Amazon FSx and Azure Files. These services are ideal for active or “hot” data requiring high performance and response times; rarely-accessed or “cold” data can live on object storage which delivers significant cost savings for long-term storage.

      Read the Blog Post: Why Cloud Native Data Access Matters

      As you migrate file workloads to the cloud, it’s important to not limit the potential of your data by locking data into a proprietary format. Cloud native data access is essential to unleash the potential of the cloud. Cloud native is a way to move data to the cloud without lock in, which means that your data is no longer tied to the file system from which it was originally served.

      Watch the TechKrunch session: How to Access Tiered Data in the Cloud

      This short webinar demonstrates how Komprise allows you to access your stored data wherever it’s stored, whenever you want, without rehydration. Because moved data are always intact, you can extract data value with both file and native access – and without penalty. Read the Komprise Architecture Overview for more information on Native Access.

      Getting Started with Komprise:

    • Native File Format

      Native File Format or Native Data Format. The file structure in which a document is created and maintained by the original creating application. Komprise provides transparent data tiering from the source storage array with native access to the cold data on the target, without getting in front of hot data on the source.

      Learn more about the benefits of Native Data Access and Komprise Transparent Move Technology


      Getting Started with Komprise:

    • NetApp Cloud Tiering

      The NetApp Cloud Tiering solution is called FabricPool.

      FabricPool is a NetApp tiering technology that enables automated tiering of data from an all-flash appliance to low-cost object storage tiers either on or off premises. This technology is a form of storage pools which are collections of storage volumes exported to a shared storage environment.

      Cloud tiering and data tiering (or data archiving) can deliver significant data storage cost savings as part of a cloud storage strategy by offloading unused cold data to more cost-efficient cloud storage solutions. The approach you take to NetApp tiering can either create an easy path to the cloud with native access and full use of data in the cloud or it can create costly cloud egress and lock-in.

      What you need to know before jumping into the cloud pool.

      Learn more about your cloud tiering choices.

      Learn more about Komprise for NetApp.PttC_pagebanner-2048x639

      Getting Started with Komprise:

    • NetApp FabricPool

      What is NetApp FabricPool? Is it the Right Choice for NetApp Data Tiering?

      FabricPool (now called NetApp Cloud Tiering) is a NetApp storage technology that enables automated data tiering at the block level from flash storage to low-cost object storage tiers, in the cloud or on premises. FabricPool is a form of storage pools which are collections of storage volumes that often blend different tiers of storage into a logical pool or shared storage environment.

      Originally developed to tier “snapshot” or backup data, the functionality has been extended to infrequently accessed blocks of the active file system. Tiered data is stored in a proprietary format in object storage and as a result can only be read via the original NetApp array. File data access from the object storage is not possible, eliminating the use of cloud-based tools for AI/ML. Additionally functions such as backup by external application or migration to new storage array require full rehydration of data, leading to egress fees from cloud storage and the need to retain sufficient storage capacity on-premises.

      Read the white paper, Cloud Tiering: Storage-Based vs Gateways vs File-Based, for more discussion on storage pools.

      Array block-level tiering is a mismatch for the cloud. NetApp cloud tiering blocks rather than entire files has the following ramifications:

      • Limited policies result in more data access from the cloud.
      • Defragmentation of blocks leads to higher cloud costs.
      • Sequential reads lead to higher cloud costs and lower performance.
      • Tiering blocks impacts performance of the storage array.


      Read the blog post: What you need to know before jumping into the cloud tiering pool

      Learn more about FabricPool technology.

      When it comes to considering NetApp data tiering and NetApp cloud tiering, it’s important to understand your cloud tiering choices. Cloud tiering and archiving can save you millions by offloading infrequently accessed cold data to cost-efficient cloud data storage. But, the approach you take can either create an easy path to the cloud for file data with full use of data in the cloud or it can create costly cloud egress and lock-in. Also, what about cloud data migration and cloud tiering for other storage systems (i.e. Isilon cloud tiering) if you are a multi-storage enterprise IT organization? And what about tiering data from older versions of NetApp? This is why increasingly the market is moving to storage-agnostic unstructured data management.


      Learn about Komprise’s native integration with NetApp and why Komprise is the right choice for NetApp cloud data tiering.

      Getting Started with Komprise:

    • Network Attached Storage (NAS)


      What is Network Attached Storage?

      Network Attached Storage (NAS) definition: A NAS system is a storage device connected to a network that allows storage and retrieval of data from a centralized location for authorized network users and heterogeneous clients. These devices generally consist of an engine that implements the file services (NAS device), and one or more devices on which data is stored (NAS drives).

      The purpose of a NAS system is to provide a local area network (LAN) with file-based, shared storage in the form of an appliance optimized for quick data storage and retrieval. NAS is a relatively expensive storage option, so it should only be used for hot data that is accessed the most frequently. Many enterprise IT organizations today are looking to migrate NAS and Object data to the cloud to reduce costs improve agility and efficiency.

      NAS Storage Benefits

      Network attached storage devices are used to remove the responsibility of file serving from other servers on a network and allows for a convenient way to share files among multiple computers. Benefits of dedicated network attached storage include:

      • Faster data access
      • Easy to scale up and expand upon
      • Remote data accessibility
      • Easier administration
      • OS-agnostic compatibility (works with Windows and Apple-based devices)
      • Built-in data security with compatibility for redundant storage arrays
      • Simple configuration and management (typically does not require an IT pro to operate)

      NAS File Access Protocols

      Network attached storage devices are often capable of communicating in a number of different file access protocols, such as:

      Most NAS devices have a flexible range of data storage systems that they’re compatible with, but you should always ensure that your intended device will work with your specific data storage system.

      Enterprise NAS Storage Applications

      In an enterprise, a NAS array can be used as primary storage for storing unstructured data and as backup for data archiving or disaster recovery (DR). It can also function as an email, media database or print server for a small business. Higher-end NAS devices can hold enough disks to support RAID, a storage technology that allows multiple hard disks into one unit to provide better performance times, redundancy, and high availability.

      Data on NAS systems (aka NAS device) is often mirrored (replicated) to another NAS system, and backups or snapshots of the footprint are kept on the NAS for weeks or months. This leads to at least three or more copies of the data being kept on expensive NAS storage. A NAS storage solution does not need to be used for disaster recovery and backup copies as this can be very costly. By finding and data tiering (or data archiving) cold data from NAS, you can eliminate the extra copies of cold data and cut cold data storage costs by over 70%.

      Check out our video on NAS storage savings to get a more detailed explanation of how this concept works in practice.

      Network Attached Storage (NAS) Data Tiering and Data Archiving

      Since NAS storage is typically designed for higher performance and can be expensive, data on NAS is often tiered, archived and moved to less expensive storage classes. NAS vendors offer some basic data tiering at the block-level to provide limited savings on storage costs, but not on backup and DR costs. Unlike the proprietary block-level tiering, file-level tiering or archiving provides a standards-based, non-proprietary solution to maximize savings by moving cold data to cheaper storage solutions. This can be done transparently so users and applications do not see any difference when cold files are archived. Read this white paper to learn more about the differences between file tiering and block tiering.

      NAS Migration to the Cloud

      smart-data-migrations-icon-circle-300x295Cloud NAS is growing in popularity. But the right approach to migrating unstructured data to the cloud is essential. Unstructured data is everywhere. From genomics and medical imaging to streaming video, electric cars, and IoT products, all sectors generate unstructured file data. Data-heavy enterprises typically have petabytes of file data, which can consist of billions of files scattered across different storage vendors, architectures and locations. And while file data growth is exploding, IT budgets are not. That’s why enterprises’ IT organizations are looking to migrate file workloads to the cloud. However, they face many barriers, which can cause migrations to take weeks to months and require significant manual effort.

      Cloud NAS Migration Challenges

      Common unstructured data migration challenges include:

      • Billions of files, mostly small: Unstructured data migrations often require moving billions of files, the vast majority of which are small files that have tremendous overhead, causing data transfers to be slow.
      • Chatty protocols: Server message block (SMB) protocol workloads—which can be user data, electronic design automation (EDA) and other multimedia files or corporate shares—are often a challenge since the protocol requires many back-and-forth handshakes which increase traffic over the network.
      • Large WAN latency: Network file protocols are extremely sensitive to high-latency network connections, which are essentially unavoidable in wide area network (WAN) migrations.
      • Limited network bandwidth: Bandwidth is often limited or not always available, causing data transfers to become slow, unreliable and difficult to manage.
      Learn more about Komprise Smart Data Migration.

      Network Attached Storage FAQ

      These are some of the most commonly asked questions we get about network attached storage systems.

      How are NAS drives different than typical data storage hardware?

      NAS drives are specifically designed for constant 24×7 use with high reliability, built-in vibration mitigation, and optimized for use in RAID setups. Network attached storage systems also benefit from an abundance of health management systems designed to keep them running smoothly for longer than a standard hard drive would.

      Which features are the most important ones to have in a NAS device?

      The ideal NAS devices have multiple (2+) drive bays, should have hardware-level encryption acceleration, offer support for widely used platforms such as AWS glacier and S3, and have moderately powerful multicore CPU’s with at least 2GB of ram to pair with it.If you’re looking for these types of features, Seagate and Western Digital are some of the most reputable brands in the NAS industry.

      Are there any downsides to using NAS storage?

      NAS storage systems can be quite expensive when they’re not optimized to contain the right data, but this can be remedied with an analytics-driven NAS data management software, like Komprise Intelligent Data Management.

      Using NAS Data Management Tools to Substantially Reduce Storage Costs

      komprise-analysis-overview-white-paper-THUMB-3-768x512One of the biggest issues organizations are facing with NAS systems is trouble understanding which data they should be storing on their NAS drives and which should be offloaded to more affordable types of storage. To keep data storage costs lower, an analytics-based NAS data management system can be implemented to give your organization more insight into your NAS data and where it should be optimally stored.

      For the thousands of data-centric companies we’ve worked with, most of them needed less than 20% of their total data stored on high-performance NAS drives. With a more thorough understanding of their NAS data, organizations are able to realize that their NAS storage needs may be much lower than they originally thought, leading to substantial storage savings, often greater than 50%, in the long run.

      Komprise makes it possible for customers to know their NAS and S3 data usage and growth before buying more storage. Explore your storage scenarios to get a forecast of how much could be saved with the right data management tools.

      This is what Komprise Dynamic Data Analytics provides.

      NAS Fast Facts:

      • Network-attached storage (NAS) is a type of file computer storage device that provides a local-area network with file-based shared storage. This typically comes in the form of a manufactured computer appliance specialized for this purpose, containing one or more storage devices.
      • Network attached storage devices are used to remove the responsibility of file serving from other servers on a network, and allows for a convenient way to share files among multiple computers. Benefits of dedicated network attached storage include faster data access, easier administration, and simple configuration.
      • In an enterprise, a network attached storage array can be used as primary storage for storing unstructured data, and as backup for archiving or disaster recovery. It can also function as an email, media database or print server for a small business. Higher end network attached storage devices can hold enough disks to support RAID, a storage technology that allows multiple hard disks into one unit to provide better performance times, redundancy, and high availability.
      • Data on NAS systems is often mirrored (replicated) to another NAS system, and backups or snapshots of the footprint are kept on the NAS for weeks or months. This leads to at least three or more copies of the data being kept on expensive NAS devices.

      Read the white paper: How to Accelerate NAS Migrations and Cloud Data Migrations 

      Know the difference between NAS and Cloud Data Migration vs. Tiering and Archiving


      Getting Started with Komprise:

    • Network File System (NFS)

      What is NFS?

      A network file system (NFS) is a mechanism that enables storage and retrieval of data from multiple hard drives and directories across a shared network, enabling local users to access remote data as if it was on the user’s own computer.

      What is the NFS protocol?

      The NFS protocol is one of several distributed file system standards for network-attached storage (NAS). It was originally developed in the 1980s by Sun Microsystems, and is now managed by the Internet Engineering Task Force (IETF).

      NFS is generally implemented in computing environments where centralized management of data and resources is critical. Network file system works on all IP-based networks. Depending on the version in use, TCP and UDP are used for data access and delivery.

      The NFS protocol is independent of the computer, operating system, network architecture, and transport protocol, which means systems using the NFS service may be manufactured by different vendors, use different operating systems, and be connected to networks with different architectures. These differences are transparent to the NFS application, and the user.