AI Data Governance

AI Data Governance was identified in the third annual state of unstructured data management survey as a top concern for generative AI adoption in the enterprise, which includes privacy, security and the lack of data source transparency in vendor solutions. The press release noted:

As the generative AI marketplace expands and executives push for departments to leverage new solutions for competitive advantage, the need for an unstructured data governance agenda is strong; IT leaders cannot forsake data integrity, data protection and risk faulty or dangerous outcomes from generative AI projects.

In the post 5 Unstructured Data Tips for AI, Komprise cofounder and COO Krishna Subramanian reviewed five areas to consider across security, privacy, lineage, ownership and governance of unstructured data for AI.

Data Security for AI

Komprise-2023-State-of-Unstructured-Data-Management_-Linkedin-Social-1200px-x-628pxData confidentiality and security are at risk with third-party generative AI applications because your data becomes part of the LLM and the public domain once you feed it into a tool. Get clear on the legal agreements in place by the vendor as pertains to your data. There are new ways to manage this now: ChatGPT now allows users to disable chat history so that chats won’t be used to train its models, although OpenAI retains the data for 30 days. One way to protect your organization is to segregate sensitive and proprietary data into a private, secure domain which restricts sharing with commercial applications. You can also maintain an audit trail of your corporate data that has fed AI applications.

Data Privacy for AI

When you create a prompt for an AI tool to produce an output based on your query, you don’t know if the result will include protected data, such as PII, from another organization. Your company may be liable if you use the tool’s output externally in content or a product and the PII is discoverable. As well, since non-AI vendors are now incorporating AI tools into their solutions, perhaps even without their customers’ knowledge, the risk compounds. Your commercial backup solution could incorporate a pretrained model to find anomalies in your data and that model may contain PII data; this could indirectly put you at a risk of violation. Data provenance and transparency around the training data used in an AI application are critical to ensure privacy.

Data Lineage for AI

Today there is not much transparency with data sources in generative AI applications. They may contain biased, libelous or unverified data sources. This makes using GenAI tools circumspect when you need results that are factually accurate and objective. Consider the problem you are trying to solve with AI to choose the right tool. Machine learning systems are better for tasks which require a deterministic outcome.

Data Ownership for AI

The data ownership piece of generative AI concerns what happens when you derive a work: who owns the IP? As it stands today, copyright law dictates that “works created solely by artificial intelligence — even if produced from a text prompt written by a human — are not protected by copyright,” according to reporting by BuiltIn. As well, the article continues, copyrighted materials used in training AI models, is permitted under the fair use law. There are currently a batch of lawsuits under consideration, however, challenging this law. It will be increasingly important for organizations to track who commissioned derivative works and how those works are used internally and externally.

Data Governance for AI

If you work in a regulated industry, you’ll need to show an audit trail of any data used in an AI tool and demonstrate that your organization is complying. A healthcare organization, for instance, would need to verify that no patient PII data has been leaked to an AI solution per HIPAA rules. This requires a data governance framework for AI that covers privacy, data protection, ethics and more. Unstructured data management solutions help by providing a means to monitor data usage in AI tools and create a foundation for unstructured data governance.

Other Considerations for AI Data Governance

At a high-level, AI data governance is the framework, policies, and procedures organizations put in place to ensure that data used in artificial intelligence (AI) systems is managed, processed, and utilized in a responsible, ethical, and compliant manner. It involves establishing guidelines for collecting, storing, processing, and using data within AI systems. Key components of AI data governance typically include:

  • Data Quality and Integrity: Ensuring that the data used in AI models is accurate, reliable, and free from biases or errors. This involves data validation, cleaning, and maintaining data integrity throughout its lifecycle.
  • Data Privacy and Security: Implementing measures to protect sensitive data, adhering to relevant data protection regulations (such as GDPR, CCPA), and securing data against unauthorized access or breaches.
  • Compliance and Regulations: Ensuring that AI initiatives comply with legal and regulatory frameworks. This involves understanding and adhering to laws and guidelines governing data usage, such as industry-specific regulations and international standards.
  • Ethical Use of Data: Establishing ethical guidelines for the collection, storage, and usage of data in AI applications. This includes considering fairness, accountability, and transparency in AI decision-making processes.
  • Data Lifecycle Management: Managing data throughout its lifecycle, from collection to processing, analysis, and disposal. This involves tracking the lineage of data, maintaining proper documentation, and ensuring responsible data handling at every stage.
  • Risk Management: Identifying and mitigating potential risks associated with data usage in AI systems, such as bias, security vulnerabilities, or unintended consequences of AI decision-making.
  • Accountability and Transparency: Establishing mechanisms to ensure accountability for AI models and making the decision-making process transparent to relevant stakeholders. This involves explaining AI model behavior and outcomes in an understandable manner.

Effective AI data governance is critical to building trust in AI systems, ensuring that they operate in a manner that respects data privacy, security, and ethical considerations. It also helps organizations make more informed decisions, reduce risks, and maintain compliance with regulatory requirements.

In this Data on the Move, we discuss AI and Unstructured Data Management.

krishna_AI

Getting Started with Komprise:

Want To Learn More?

AI Infrastructure

What is AI Infrastructure?

AI storage and data infrastructure are evolving rapidly to support the complex demands of training and deploying machine learning models. This technology plays a pivotal role in ensuring optimal performance, scalability, and reliability in AI applications.

AI Storage 

AI storage solutions are specifically designed to handle the unique challenges posed by AI workloads—specifically, efficiently managing the massive datasets used in training models. AI storage solutions prioritize high-throughput, low-latency, and scalable performance. Technologies such as solid-state drives (SSDs), distributed storage architectures, and parallel file systems are commonly employed.

One of the primary challenges in AI storage is maintaining high performance when processing large datasets multiple times, which requires high-speed access to the data. AI storage infrastructure addresses this challenge by providing solutions that can handle the parallel processing needs of deep learning frameworks at the speed which AI applications demand.

AI storage solutions must be highly reliable and deliver features for data replication, snapshots, and backups to ensure the integrity and availability of the training data. Given the sensitivity of the data often used in AI applications, organizations must develop strong data governance guidelines and policies to protect PII and IP data from leakage into commercial tools that can expose this data to other users and organizations.

This blog details five key areas for AI data governance to consider across security, privacy, lineage, ownership and governance of unstructured data for AI – or SPLOG.

Top-5-Tips-for-AI-Blog_-Linkedin-Social-1200px-x-628px-002

Data Infrastructure for AI

Data infrastructure for AI consists of an ecosystem of technologies and processes for managing and manipulating data for AI applications. This includes not only storage but also tools and frameworks for data preprocessing, cleaning, and transformation, along with unstructured data management solutions.

  • Distributed computing frameworks, such as Apache Hadoop and Apache Spark, are integral to AI data infrastructure, delivering parallel processing of data across multiple nodes or servers.
  • Graphics Processing Units (GPUs) also play a crucial role by accelerating the training and inference processes of machine learning models. GPUs, designed for parallel processing, handle the complex mathematical operations involved in training deep neural networks. They work with AI storage to ensure high-speed access to data, reducing latency and improving overall performance. GPUs also play a crucial role in the inference phase, bringing real-time predictions. AI storage solutions must be capable of supporting the high-throughput requirements of GPUs, to prevent data bottlenecks. Specialized AI accelerators from NVIDIA and AMD further enhance parallel processing capabilities with speed, efficiency, and scalability.
  • Unstructured data management solutions play a pivotal role in AI infrastructure by delivering a unified, independent console for holistic data visibility across on premises, edge and cloud storage. An unstructured data management system such as Komprise can deliver a Global File Index for users to conduct ad hoc searches of file and object data to find the precise data sets they need for AI. Komprise also delivers automated workflow capabilities with Smart Data Workflows, so that users can search, tag, and move data to data links and other platforms for use by AI applications—or similarly, to exclude sensitive data from ingestion into AI tools by automatically finding and moving it into immutable, object storage in the cloud.

Learn more about AI and Big Data capabilities in Komprise.

Read the white paper on tactics for managing and protecting data for GenAI.

White-paper-Unstructured-Data-Management-In-the-Age-of-Generative-AI_-Linkedin-Social-1200px-x-628px

The Role of the Cloud in AI Infrastructure

Cloud storage services from industry leaders like Amazon Web Services (AWS) and Microsoft Azure also play a significant role in AI infrastructure. These cloud platforms offer scalable and flexible storage solutions that can be tailored to the specific needs of AI applications, such as AWS S3 and Azure’s Blob Storage.

AWS and Azure are also developing an expanding array of AI and machine learning tools and services, creating easily deployable AI-as-a-service offerings for companies.

Learn more here:

As AI evolves, enterprise organizations will look to integrate the right combination of storage solutions, data infrastructure, unstructured data management systems, GPUs, and cloud services to efficiently manage and protect data used for AI initiatives.

Getting Started with Komprise:

Want To Learn More?

Air Gap

An air gap, in the context of computer security, refers to a physical or logical separation between a computer or network and any external or untrusted networks or systems. It is a security measure used to protect sensitive or critical information from unauthorized access or cyber threats.

The concept behind an air gap is to create a physical or logical barrier that prevents direct communication or data transfer between the protected system and external networks. This isolation helps reduce the risk of malicious actors or malware infiltrating the system and compromising its security.

Physical and Logical Air Gap

  • Physical air gap: The isolated system is physically disconnected from any external networks, typically by physically unplugging network cables or using dedicated networks that are not connected to the internet or other networks. This is commonly seen in high-security environments or critical infrastructure systems where data protection is of utmost importance.
  • Logical air gap (or virtual air gap): Using network configurations, firewalls, or security controls to create a virtual separation between the protected system and external networks. While the system may still be physically connected to a network, it is isolated in such a way that communication with external systems is restricted or highly regulated.

Air gaps are commonly employed in situations where highly sensitive or classified data is involved, such as government or military networks, financial systems, or critical infrastructure control systems. However, it is important to note that air gaps are not foolproof and additional security measures should be implemented to address potential risks like insider threats or physical access breaches.

In the blog post How to Protect File Data from Ransomware at 80 percent Lower Cost, there an overview of how to create affordable cloud ransomware recovery copy that is logically air-gapped.

If you want to use Komprise for both hot and cold data, Komprise can create an affordable logically isolated recovery copy of all data in an object-locked destination such as Amazon S3 IA, so data is protected even if the backups and primary storage are attacked.

Getting Started with Komprise:

Want To Learn More?

Amazon (AWS) S3 Intelligent Tiering

S3 Intelligent Tiering is an Amazon storage class aimed at data with unknown or unpredictable data access patterns. See our S3 Intelligent Tiering glossary entry for further information.AWS_logo_featured_600x400-1

Learn more about AWS cloud tiering, cloud data migration and the Komprise AWS partnership.

Komprise-Smart-Data-Migration-for-AWS-White-Paper-SOCIAL-768x402

Getting Started with Komprise:

Want To Learn More?

Amazon Glacier (AWS Glacier)

Arctic_Glacier

What is Amazon S3 Glacier (AWS Glacier)?

Amazon S3 Glacier, also known as AWS Glacier, is a class of cloud storage available through Amazon Web Services (AWS).  Amazon S3 Glacier is a lower-cost storage tier designed for use with data archiving and long-term backup services on the public cloud infrastructure.

Amazon S3 Glacier was created to house data that doesn’t need to be accessed frequently or quickly. This makes it ideal for use as a cold storage service, hence the inspiration for its name.

Amazon S3 Glacier retrieval times range from a few minutes to a few hours with three different speed options available: Expedited (1-5 minutes), Standard (3-5 hours), and Bulk (5-12 hours).

Amazon S3 Glacier Deep Archive offers 12-48-hour retrieval times. The faster retrieval options are significantly more expensive, so having your data organized into the correct tier within AWS cloud storage is an important aspect of keeping storage costs down.

Other Glacier features:
  • The ability to store an unlimited number of objects and data
  • Data stored in S3 Glacier is dispersed across multiple geographically separated Availability Zones within the AWS region
  • An average annual durability of 99.999999999%
  • Checksum uploads to validate data authenticity
  • REST-based web service
  • Vault, Archive, and Job data models
  • Limit of 1,000 vaults per AWS account

Main Applications for Amazon S3 Glacier Storage

There are several scenarios where Glacier is an ideal solution for companies needing a large volume of cloud storage.

  1. Huge data sets. Many companies that perform trend or scientific analysis need a huge amount of storage to be able to house their training, input, and output data for future use.
  2. Replacing legacy storage infrastructure. With the many advantages that cloud-based storage environments have over traditional storage infrastructure, many corporations are opting to use AWS storage to get more out of their data storage systems. AWS Glacier is often used as a replacement for long term tape archives.
  3. Healthcare facilities’ patient data. Patient data needs to be kept for regulatory or compliance requirements. Glacier and Glacier Deep Archive are ideal archiving platforms to keep data that will hardly need to be accessed.
  4. Cold data with long retention times. Finance, Research, Genomics, and Electronic Design Automation and Media, Entertainment are some examples of industries where cold data and inactive projects may need to be retained for long periods of time even though they are not actively used.  AWS Glacier storage classes are a good fit for these types of data.  The project data will need to be recalled before it is actively used to minimize retrieval delays and costs.

Amazon S3 Glacier vs S3 Standard

Amazon’s S3 Standard storage and S3 Glacier are different classes of storage designed to handle workloads on the AWS cloud storage platform.

  • S3 Glacier is best for cold data that’s rarely or never accessed
  • Amazon S3 Standard storage is intended for hot and warm data that needs to be accessed daily and quickly

The speed and accessibility of S3 Standard storage comes at a much higher cost compared to S3 Glacier and the even more economical S3 Glacier Deep Archive storage tiers. Having the right data management solution is critical to help you identify and organize your hot and cold data into the correct storage tiers, saving a substantial amount on storage costs.

Benefits of a Data Management System to Optimize Amazon S3 Glacier

migrationisvpartner-150x150A comprehensive suite of unstructured data management and unstructured data migration capabilities allow organizations to reduce their data storage footprint and substantially cut their storage costs. These are a few of the benefits of integrating an analytics-driven data management solution like Komprise Intelligent Data Management with your AWS storage:

Get full visibility of your AWS and other storage data

Across AWS and other cloud platforms to understand how much NAS data is being accrued and whether it’s hot or cold so you make better data storage investment and data mobility decisions.

Intelligent tiering and life cycle management for AWS storage

Optimize and improve how you manage files and objects across EFS, FSX, S3 Standard and S3 Glacier storage classes based on access patterns.

Intelligent AWS data retrievals

Don’t get hit with unexpected data retrieval fees on S3 Glacier – Komprise enables intelligent recalls based on access patterns so if an object on Glacier becomes active again, Komprise will move it up to an S3 storage class.

Bulk retrievals for improved AWS user performance

Improve performance across entire projects from S3 Glacier storage classes – if an archived project is going to become active, you can prefetch and retrieve the entire project from S3 Glacier using Komprise so users don’t have to face long latencies to get access to the data they need.

Minimize AWS storage costs

With analytics-driven cloud data management that monitors retrieval costs, egress costs and other costs to minimize them by promoting data up and recalling it intelligently to more active storage classes.

Access AWS data natively

Access data that has been moved across AWS as objects from Amazon S3 storage classes or as files from File and NAS storage classes without the need for additional stubs or agents.

Reduce AWS cloud storage complexity

Reduce the complexity of your cloud storage and NAS environment and manage your data more easily through an intuitive dashboard.

Optimize the AWS storage savings

Komprise Intelligent Data Management allows you to better manage all the complex data storage, retrieval, egress and other costs. Know first. Move smart. Take control.

Easy, on-demand scalability

Komprise provides you with the capacity to add and manage petabytes without limits or the need for dedicated infrastructure.

Integrate data lifecycle management

Integrate easily with an AWS Advanced Tier partner such as Komprise for lifecycle management or other use cases.

Move data transparently to any tier within AWS

Your users won’t experience any difference in terms of data access. You’ll notice a huge difference in cost savings and unstructured data value with Komprise.

Create automated data management policies and data workflows

Continuously manage the lifecycle of the moved data for maximum savings. Build Smart Data Workflows to deliver the right data to the right teams, applications, cloud services, AI/ML engines, etc. at the right time.

Streamline Amazon S3 Glacier Operations with Komprise Intelligent Data Management

Komprise’s Intelligent Data Management allows you to seamlessly analyze and manage data across all of your AWS cloud storage classes so you can move data across file, S3 Standard and S3 Glacier storage classes at the right time for the best price/performance. Because it’s vendor agnostic, its standards-driven analytics and data management work with  the largest storage providers in the industry and have helped companies save up to 50% on their cloud storage costs.

If you’re looking to get more out of your AWS storage, contact a data management expert at Komprise today and see how much you could save on data storage costs. Read the white paper: Smart Data Migration for AWS.

Komprise-Smart-Data-Migration-Page-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Amazon S3 (AWS S3)

Amazon Simple Storage Service, known as Amazon S3 or AWS S3, is an object storage service that offers industry-leading scalability, data availability, security, and performance.

See S3 in our glossary for further information.

Learn more about Komprise Intelligent Data Management for AWS data storage.

Komprise-Smart-Data-Migration-for-AWS-White-Paper-SOCIAL-768x402

Getting Started with Komprise:

Want To Learn More?

Amazon S3 Glacier Instant Retrieval

Amazon S3 Glacier Instant Retrieval is an archive storage class that was introduced in November, 2021. According to Amazon, it delivers the lowest-cost archive storage with milliseconds retrieval for rarely accessed data.

Komprise works closely with AWS to ensure enterprise customers have visibility into data across storage environments. With analytics-driven unstructured data management, Komprise right places data to the right storage class: Hot data on high performance managed file services in AWS and cold data on lower cost Amazon S3 Glacier object storage such as Amazon S3 Glacier Instant Retrieval and Amazon S3 Infrequent Access.

Learn more about Amazon S3 Storage Classes.

Learn more about Komprise for AWS.

Getting Started with Komprise:

Want To Learn More?

Amazon Tiering

What is Amazon Tiering?

Amazon Web Services (AWS) offers several storage services that support data tiering based on different storage classes. These data storage classes allow customers to optimize their storage costs and performance by choosing the most suitable option for their data based on its access patterns and durability requirements.

Learn more about Komprise file and object data migration, data tiering and ongoing data management.

AWS Storage Tiering Options

Amazon S3 Storage Classes: Amazon Simple Storage Service (S3) provides multiple storage classes to accommodate different data access patterns and cost requirements:

  • Standard: This is the default storage class for S3 and offers high durability, availability, and performance for frequently accessed data.
  • Intelligent-Tiering: This storage class automatically moves objects between two access tiers (frequent access and infrequent access) based on their usage patterns. It optimizes costs by automatically transitioning objects to the most cost-effective tier.
  • Standard-IA (Infrequent Access): This storage class is suitable for data that is accessed less frequently but still requires rapid access when needed. It offers lower storage costs compared to the Standard class.
  • One Zone-IA: Similar to Standard-IA, but the data is stored in a single Availability Zone, which provides a lower-cost option for customers who don’t require data redundancy across multiple zones.
  • Glacier, Glacier IT and Glacier Deep Archive: These storage classes are designed for long-term archival and data retention. Data stored in Amazon S3 Glacier is accessible within minutes to hours, while Glacier Deep Archive is for data with retrieval times of 12 hours or more.

Amazon EBS Volume Types: Amazon Elastic Block Store (EBS) provides different volume types for block storage in AWS. While not strictly tiering, these volume types offer varying performance characteristics and costs:

  • General Purpose SSD (gp2): This is the default EBS volume type and provides a balance of price and performance for a wide range of workloads.
  • Provisioned IOPS SSD (io1/io2): These volume types are designed for applications that require high I/O performance and consistent low-latency access to data.
  • Throughput Optimized HDD (st1): This volume type offers low-cost storage optimized for large, sequential workloads that require high throughput.
  • Cold HDD (sc1): This volume type provides the lowest-cost storage for infrequently accessed workloads with large amounts of data.

Amazon S3 Glacier and Glacier Deep Archive: These are the storage classes within Amazon S3 designed specifically for long-term data archival and retention. The retrieval times are longer compared to other storage classes, but they offer significantly lower storage costs for data that is rarely accessed.

Amazon tiering options are designed to help AWS customers effectively manage their data storage costs and performance based on the specific requirements of their workloads and data access patterns.

Komprise Intelligent Data Management for AWS

Komprise is an AWS Migration and Modernization competency partner, working closely with AWS teams to follow best practices and support cloud data storage services including Amazon EFS, Amazon FSx and Amazon S3 (including Amazon S3 Glacier Flexible Retrieval and Glacier Instant Retrieval storage classes). The Komprise analytics-driven SaaS platform allows customers to analyze, mobilize and manage their file and object data using AWS allowing enterprise customers to:

  • Understand AWS NAS & Object Data Usage and Growth
  • Estimate ROI of AWS Data Storage
  • Migrate Smarter to Amazon FSx for NetApp ONTAP
  • Easily Integrate AWS Data Lifecycle Management
  • Access Moved Data as Files Without Stubs or Agents
  • Gain Native Data Access in the AWS Cloud Without Storage Vendor Lock-In
  • Rapidly Migrate Object Data Into AWS Storage
  • Reduce AWS Unstructured Data Complexity
  • Scale On-Demand with Modern, SaaS Architecture

Komprise-blog-Anthony-Fiore-AWS-SOCIAL

Learn more about Komprise Intelligent Data Management for AWS Storage.

Getting Started with Komprise:

Want To Learn More?

Analytics-driven Data Management

Analytics-driven data management is a core principle of the standard-based platform of Komprise Intelligent Data Management that’s based on data insight and automation to strategically and efficiently manage and move unstructured data at massive scale. With Komprise, you can know first, move smart, and take control of massive unstructured data growth while cutting 70% of your enterprise data storage costs, including backup and cloud costs.

Analyze-3@3x-400x400

Know First: Get insight into your data before you invest. See across your data storage silos, vendors, and clouds to make informed storage and backup decisions.

  • Analyze any NAS, S3
  • Plan and project storage cost savings
  • Search, tag, build virtual data lakes with a global file index

Cloud-Migration-3@3x-400x400Move Smart: Ensure the right data is in the right place at the right time. Establish analytics-driven policies to manage data based on its need, usage, and value.

Deliver-Value-3@3x-400x400Take Control: Get back to the business at hand while reducing your storage, backup, and cloud costs and get the fastest, easiest path to the cloud for your file and object data.

  • Ensure you have data mobility and avoid storage-vendor lock-in
  • Open, standards-based platform
  • Native cloud access

Read the Komprise Architecture Overview white paper.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

Archival Storage

What is Archival Storage?

Archival Storage is a source for data that is not needed for an organization’s everyday operations, but may have to be accessed occasionally.

By utilizing an archival storage, organizations can leverage to secondary sources, while still maintaining the protection of the data.

Utilizing archival storage sources reduces primary storage costs required and allows an organization to maintain data that may be required for regulatory or other requirements.

Data archiving, also known as data tiering, is intended to protect older information that is not needed for everyday operations, but may have to be accessed occasionally. Data Archival and Tiering storage is a tool for reducing your primary storage need and the related costs, rather than acting as a data recovery tool.

solutions_that_archiveWhy Archival Storage?

  • Some data archives allow data to be read-only to protect it from modification, while other data archiving products treat data as to allow users to modify it.
  • The benefit of data archiving is that it reduces the cost of primary storage. Alternatively, archive storage costs less because it is typically based on a low-performance, high-capacity storage medium.
  • Data archiving takes a number of different forms. Options can be online data storage, which places archive data onto disk systems where it is readily accessible. Archives are frequently file-based, but object storage is also growing in popularity. A key challenge when using object storage to archive file-based data is the impact it can have on users and applications. To avoid changing paradigms from file to object and breaking user and application access, use data management solutions that provide a file interface to data that is archived as objects.
  • Another archival system uses offline data storage where archive data is written to tape or other removable media using data archiving software rather than being kept online. Data archiving on tape consumes less power than disk systems, translating to lower costs.
  • A third option is using cloud data storage, such as those offered by Amazon and Microsoft Azure – this can be less expensive if done right, but requires ongoing investment. A Smart Data Migration strategy is essential.
  • The data archiving process typically uses automated software, which will automatically move “cold” data via policies set by an administrator. Today, a popular approach to data archiving is to make the archive “transparent” – so the archived data is not only online but the archived data is fully accessed exactly as before by users and applications, so they experience no change in behavior. The patented Komprise Transparent Move Technology is designed to allow you to transparently archive and tier data.

Getting Started with Komprise:

Want To Learn More?

AWS DataSync

AWS DataSync is an online service that moves data between on premises and AWS Storage services. According to AWS, DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSz for OpenZFS file systems, and Amazon FSx for NetApp ONTAP file systems.

Point tools vs. platform

Cloud migration of file data can be complex, labor-intensive, costly and time-consuming. Understanding your migration options is essential. Generally they are as follows:

  • Free Tools: Good for tactical use cases, but often require a lot of hand-holding. Data migration reliability and performance are concerns.
  • Point Data Migration Solutions: Usually older vendors who have a professional-services-centric approach. Watch out for difficult to set up and use technologies with legacy architectures, which will present user disruption and scalability challenges.
  • Komprise Elastic Data Migration: Makes cloud data migrations simple, fast, reliable and eliminates sunk costs since you continue to use Komprise after the migration. Komprise is the only solution that gives you the option to cut 70%+ cloud storage costs by placing cold data in Object classes while maintaining file metadata so it can be promoted in the cloud as files when needed.

komprise_migration_vs_point_tooks

Learn more about Komprise for AWS.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

AWS Snowball

What is AWS Snowball Edge?

AWS Snowball Edge is a hardware appliance used to migrate petabyte-scale data into and out of Amazon S3, mitigating issues with large-scale data transfers including high network costs, limited connectivity such as in remote locations, long transfer times, and security concerns. Beyond data transfer and cloud data migration use cases, the Snowball Edge device features on-board storage and compute power to enable local processing and analytics at the edge. Once transferred into AWS S3, an organization can move the data into other storage classes as needed.

Snowball appliances are shipped to the customer and deployed on the customer’s network. Data is copied to the Snowball appliance and then return shipped to AWS where the data is copied to the appropriate AWS storage tier and made available for access.

According to Hackernoon, Snowball Edge has been used in oil rigs, with the U.S. Department of Defense, and in an emergency situation for the U.S. Geological Survey needing to quickly export data from its data center during a volcanic eruption.

Considerations for AWS Snowball Edge

Enterprises have two options for AWS Snowball:

  • AWS Snowball Edge Storage Optimized devices provide both block storage and Amazon S3-compatible object storage, and 40 vCPUs. They are well suited for local storage and large scale-data transfer. It’s possible to combine up to 12 devices together and create a single S3-compatible bucket that can store nearly 1 petabyte of data.
  • Snowball Edge Compute Optimized devices provide 52 vCPUs, block and object storage, and an optional GPU for use cases including machine learning and full motion video analysis.
  • Snowball supports specific Amazon EC2 instance types and AWS Lambda functions, so you can develop and test in the AWS Cloud, then deploy applications on devices in remote locations to collect, pre-process, and ship the data to AWS.
  • Snowball can transport multiple terabytes of data and multiple devices can be used in parallel or clustered together to transfer petabytes of data into or out of AWS.

Cloud Tiering to AWS

By using Komprise for cloud tiering to AWS, you can save not only on your on-premises storage but also on your cloud costs. Users get transparent access to the files moved by Komprise from the original location, and with Komprise moving data in native format, you can give users direct, cloud-native access to data in AWS while eliminating egress fees and rehydration hassles.

Learn more about the benefits of moving data in cloud native format.

Smart Data Migration for AWS

smart-file-data-migration-aws-thumbA smart data migration strategy for enterprise file data means an analytics-first approach ensuring you know which data can migrate, to which class and tier, and which data should stay on-premises in your hybrid cloud storage infrastructure. This paper introduces the benefits of a smart data migration strategy for file workloads to AWS cloud storage services. Komprise and AWS enable your organization to:

  • Understand your NAS & object data usage and growth.
  • Estimate the ROI of AWS storage in your environment.
  • Migrate smarter to Amazon FSx for NetApp ONTAP.
  • Access moved data as files without stubs or agents.
  • Reduce complexity and scale on-demand.
  • Deliver native data access in the cloud without lock-in.
Read the white paper: Smart Unstructured Data Migration for AWS
Learn more about your Cloud Tiering choices.
Learn more about Komprise for AWS.

Getting Started with Komprise:

Want To Learn More?

AWS Storage

What is AWS Cloud Storage?

The AWS cloud service has a full range of options for individuals and enterprises to store, access and analyze data. AWS offers options across all three types of cloud data storage object storage, file storage and block storage.

Here are the Amazon StorageAWS Storage choices:

  • Amazon Simple Storage Service (S3): S3 is a popular AWS service that provides scalable and highly durable object storage in the cloud.
  • AWS Glacier: Glacier provides low-cost highly durable archive storage in the cloud. It’s best for cold data as access times can be slow.
  • Amazon Elastic File System (Amazon EFS): EFS provides scalable network file storage for Amazon EC2 instances.
  • Amazon Elastic Block Store (Amazon EBS): This service provides low-latency block storage volumes for Amazon EC2 instances.
  • Amazon EC2 Instance Storage. An instance store is ideal for temporary storage of information that changes frequently, such as buffers, caches and scratch data, and consists of one or more instance store volumes exposed as block devices.
  • AWS Storage Gateway. This is a hybrid storage option that integrates on-premises storage with cloud storage. It can be hosted on a physical or virtual server.
  • AWS Snowball. This data migration service transports large amounts of data to and from the cloud and includes an appliance that’s installed in the on-premises data center.

Komprise-AWS-Snowball-blog-SOCIAL-1

Each of these Amazon storage classes has several tiers at different price points – so it is important to put the right data in the right storage class at the right time to optimize price and performance.

Komprise Intelligent Data Management for AWS Storage

Komprise helps organizations get more value from their AWS storage investments while protecting data assets for future use through analysis and intelligent data migration and cloud data tiering.

AWS-Use-Case-Table-2

Learn more at Komprise for AWS.

Getting Started with Komprise:

Want To Learn More?

Azure NetApp Files

What is Azure NetApp Files?

Azure NetApp Files is a cloud-based file storage service offered by Microsoft Azure that enables enterprise-grade file shares to be created and managed in the cloud. The service is built on NetApp’s technology and is designed to meet the high-performance, availability, and scalability requirements of enterprise file data workloads.

Azure NetApp Files provides a fully managed service that allows customers to deploy and manage high-performance file shares in Azure. It offers features such as NFS and SMB protocol support, file share snapshots, and data replication across Azure regions. Customers can also choose from different performance tiers and capacity sizes to optimize the cost and performance of their file shares.

Azure NetApp Files is commonly used for use cases such as database file shares, big data analytics, media and entertainment workloads, and high-performance computing. It provides a scalable, high-performance, and highly available solution for enterprise customers who need to store and manage large amounts of file data in the cloud.

Azure NetApp Files Data Management

Komprise first announced support for Azure NetApp Files in 2020:

By using Komprise Intelligent Data Management, customers can migrate file workloads to the cloud more than 27 times faster than with other solutions. They can also reduce cloud NAS by 70 percent by transparently archiving cold data from Azure NetApp Files to various Azure Blob storage classes. Komprise’s Transparent Move Technology™ (TMT) enables archived data to be viewed as files, native objects, or both. These new capabilities now allow Komprise to deliver the same on-premises NAS data management features to cloud-enabled NAS.

Read the white paper: Accelerate Cloud and NAS Migrations to NetApp CVO and Azure NetApp Files (ANF)

Learn more about Komprise for Azure.

Learn more about Komprise for NetApp.

Komprise-and-Netapp-300x164

Getting Started with Komprise:

Want To Learn More?

Azure Storage

What is Azure Storage?

Microsoft Azure hosts a complete array of cloud data storage options to meet the diverse data needs of enterprises today, including backup, tiering, data lakes, structured and unstructured data management. Azure Storage Services include:

  • Azure Blob: This is a scalable object store best suited for storing and accessing unstructured data and to support analytics and data lake projects.
  • Azure Files: File shares for cloud or on-premises deployments that you can access through the Server Message Block (SMB) protocol.
  • Azure Queues: Allows for asynchronous message between application components.
  • Azure Tables: A NoSQL solution for schema-less storage of structured data.
  • Azure Disks: Allows data to be persistently stored in blocks and accessed from an attached virtual hard disk.
  • Azure Data Lake Storage: A storage platform for ingestion, processing, and visualization that supports common analytics frameworks and provides automatic geo-replication.

Greater Azure Storage Savings and Value with Komprise

Komprise helps organizations get the most value from their Azure Blob and Azure File storage investments while protecting data assets for future use through analysis and intelligent data migration and cloud data tiering.

Azure-Use-Case-Table

Learn more at Komprise for Azure File and Azure Blob data management and migration.

Komprise-Hypertransfer-Migration-White-Paper-SOCIAL-2-768x402

Getting Started with Komprise:

Want To Learn More?

Azure Tiering

What is Azure Tiering?

Azure Storage offers several classes of cloud data storage for customers. However, to maximize savings and ROI from the cloud, IT directors need to consider tiering strategies. Cloud tiering moves less frequently used data, also known as cold data, from expensive on-premises file storage or Network Attached Storage (NAS) or cloud file storage such as Azure Files to cheaper levels of storage in the cloud, typically object storage classes aka Azure Blob storage. 

Cloud tiering enables data to move across different storage tiers – and different cloud tiering solutions support different storage options. We will cover both the storage tiers in the Azure cloud and the options available to do cloud tiering for Azure.

Azure Files and Azure Blob have different tiers of storage at different price points:

Azure Files is Microsoft’s file storage solution for the cloud. As with all file storage solutions, it is more expensive than object storage solutions such as Azure Blob, especially when you add the required replication and data protection costs for files. Azure File Storage Hot tier is more than 1.9 times more expensive than Azure Blob Cool. 

Azure Files supports two storage tiers: Standard and Premium.

  • Standard file shares are created in general purpose (GPv1 or GPv2) storage accounts; 
  • Premium file shares are created in FileStorage storage accounts.

What is Azure Blob?

Azure Blob is Microsoft’s object storage solution for the cloud

Azure Blob storage is optimized for storing massive amounts of unstructured data. It’s enabled for the following access tiers:

  • Hot: storing data that is accessed frequently.
  • Cool: storing data that is infrequently accessed and stored for at least 30 days.
  • Archive: storing data that is rarely accessed and stored for at least 180 days with flexible latency requirements

According to Microsoft:

“You can upload data to your required access tier and change the blob access tier among the hot, cool, or archive tiers as usage patterns change, without having to move data between accounts. All tier change requests happen immediately and tier changes between hot and cool are instantaneous.”

What is Azure File Sync?

Azure Files has a service called Azure File Sync which enables an on-premises Windows Server to do cloud tiering to file storage in the cloud, not object storage. 

Azure File Sync acts as a gateway that caches data locally and puts cold file objects in Azure File cloud storage. When enabled, Azure Files Sync stores hot files on the local Windows server while cool or cold files are split into namespace (file and folder structure) and file content. The namespace is stored locally, and the file content is stored in an Azure file share in the cloud. Azure will automatically tier cold data based on volume or age thresholds. See Microsoft Cloud Tiering overview.

Considerations for Microsoft Azure Cloud TieringCold-Data-Tiering

Cloud tiering can save organizations up to 70% on on-premises storage costs when done correctly. But there are several limitations of Azure Cloud Tiering that you need to consider:

Azure File Sync only tiers to Azure Files and leads to higher cloud costs.

Azure Files is a file service in Azure and it is almost double the cost of the Azure Blob Cool tier. Since file storage is not resilient, data on Azure Files most commonly needs replication, snapshots and backups – leading to higher data management costs. An ideal cloud tiering solution should tier files from your NAS to an object storage environment to maximize savings. Otherwise, you are paying for higher costs in the cloud.  

Azure File Sync only tiers blocks of data to the cloud and leads to 75% higher cloud egress costs.

This means you cannot directly access your files in Azure; you have to go through the on-premises Windows Server to get your data. This leads to 75% higher cloud egress costs, and it limits the use of your data in the cloud. To learn more about the differences between block tiering and file tiering, read our block-level tiering vs file-level tiering white paper to learn more. For an analysis of the cloud egress costs of solutions like Azure File Sync Cloud Tiering, read the Cloud Tiering whitepaper.

Azure File Sync is only available on Windows Server environments.

Most organizations today have multiple file server and NAS environments. Using a different tiering strategy for each environment is tedious, error prone, and difficult to manage. Consider an unstructured data management solution that works across your multiple storage vendor environments and transparently tiers and archives data.

Komprise enables enterprise IT organizations to quickly analyze data and make smart decisions on where data should live based on age, usage and other requirements. Komprise works across your multi-vendor NAS and object environments and clouds via standard protocols such as NFS, SMB and object. By using Komprise for cloud tiering to Azure, you can save not only on your on-premises storage but also on your cloud costs since you do not have to tier to Azure Files, you can tier directly to Azure Blob. Users get transparent access to the files moved by Komprise from the original location, and with Komprise moving data in native format, you can give users direct, cloud-native access to data in Azure while eliminating egress costs and data rehydration hassles. 

Learn more about your Cloud Tiering choices 

Learn more about Komprise for Microsoft Azure

Smart-Data-Migration-for-Azure-White-Paper-SOCIAL-768x402
Komprise Smart Data Migration for Azure. Smarter. Faster. Proven.

Getting Started with Komprise:

Want To Learn More?

Backup

Backup (also see Data Backup) is the process of creating copies of data to protect against loss or damage. It involves making duplicate copies of important files, databases, applications, or entire systems, which can be used to restore the data in the event of a disaster, hardware failure, human error, or other unforeseen circumstances.

Rein-In-Storage-768x512

Key points about backup storage:

  • Data Protection: The primary purpose of backups is to safeguard data and ensure its availability even in the face of data loss incidents. Backups serve as a safety net, allowing organizations and individuals to recover lost or corrupted data and resume normal operations.
  • Backup Frequency: The frequency of backups depends on various factors, such as the criticality of the data, the rate of data change, and the desired recovery point objective (RPO). RPO determines the maximum acceptable amount of data loss in the event of a failure. Organizations may choose to perform backups daily, weekly, or in more frequent intervals based on their needs.
  • Full and Incremental Backups: Different backup strategies can be employed, such as full and incremental backups. A full backup involves copying all data from the source to the backup storage. Incremental backups only copy the changes made since the last backup, resulting in smaller backup sizes and faster backups. A combination of full and incremental backups can provide a balance between data protection and storage efficiency.
  • Backup Storage: Backups are stored on separate storage devices or media from the original data. This ensures that if the primary storage fails or becomes inaccessible, the backups remain unaffected. Common backup storage options include external hard drives, network-attached storage (NAS), tape drives, cloud storage, or off-site backup facilities.
  • Data Recovery: When data loss occurs, backups are used to restore the lost or corrupted data. The recovery process involves retrieving the backup data and copying it back to the original or alternative locations. Depending on the backup strategy employed, recovery may involve restoring the latest full backup followed by incremental backups or directly restoring the most recent backup.
  • Testing and Verification: It is important to regularly test backups and verify their integrity to ensure they are usable when needed. Regular restore tests help identify any issues or discrepancies in the backup data or the recovery process. Verification involves performing integrity checks on the backup files to ensure they are not corrupted or damaged.

Backup practices will vary depending on the scale of data, business requirements, and compliance regulations. Be sure to follow best practices, including having multiple copies of backups, storing backups off-site or in the cloud for disaster recovery, and regularly reviewing and updating backup strategies to align with changing data needs and technologies.

Komprise-SFD22-blog-thumb-768x512

Many backup vendors talk about data management for the data they are backing up. Komprise is a data agnostic unstructured data management solution. Komprise partners with backup vendors and allow customers to know first, move smart and take control of file and object data with an analytics-driven Intelligent Data Management platform as a service.

Getting Started with Komprise:

Want To Learn More?

BlueXP

NetApp BlueXP is a management console designed to unify a disparate set of hybrid cloud NetApp products.  The BlueXP interface provides a set of data management capabilities, that are licensed separately. The capabilities are:

DCIG-Data-Management-White-Paper-Social

BlueXP is ideally suited for organizations who have multiple NetApp tools and want to simplify the management with a unified console, sometimes called a NetApp control plane. It is important to note, however, that BlueXP is NetApp and storage centric and has key limitations and restrictions that impact the applicability and simplicity of the solution. With the huge influx of unstructured data, customers want a centralized solution they can use across their storage platforms and in a multi-cloud and hybrid environment. It is imperative that broader unstructured data management requirements and the benefits of a storage-agnostic solution are considered before embarking on this effort.

Read the 5 requirements of unified data control plane for unstructured data management.

NetApp BlueXP Tiering

Data tiering is one component of BlueXP. According to NetApp, BlueXP is able to:

Cloud-tiering-pool-blog-callout@3x-2048x737First of all, it’s important to understand the differences and benefits of file-level tiering vs block-level tiering used by NetApp. Read the white paper. Secondly, it’s important to understand the tiering requirements. As Komprise co-founder and CEO Kumar Goswami wrote in this post, block-level tiering has benefits for tiering snapshots, certain log files and other data that is proprietary and deleted in short order. But block-level tiering has significant shortcomings, including:

  • Limited policies result in more data accessed, increasing increasing cloud egress costs.
  • Defragmentation of blocks leads to higher cloud storage costs.
  • Sequential reads lead to higher cloud costs and lower performance.
  • Data tiered to the cloud cannot be accessed from the cloud without licensing a storage file system.
  • Tiering blocks impacts performance of the storage array.
  • Data access results in re–hydration, thereby reducing potential cost savings.
  • Block tiering does not reduce backup costs or the backup window.
  • Block tiering locks you into your storage vendor and requires rehydration of all data you tier when switching to a new system. Read: 5 Mistakes to Avoid in a Data Storage Refresh
  • Proprietary lock-in and cloud file storage licensing costs.

NetApp BlueXP Tiering Feature Comparison with Komprise Intelligent Data Management

Komprise is a storage-agnostic control plane across all your hybrid data estate that optimizes data storage costs and puts enterprise IT organizations in control of their data at all times with no lock-in. Komprise uses a superior file-based tiering solution. Blow are questions you should ask and a table comparing NetApp BlueXP functionality versus Komprise Intelligent Data Management:

  • Can you tier data that is more than 183 days old?
  • Can you tier directly to Amazon S3 IA or Azure Blob Cool?
  • Do you require a cooling period on rehydration?
  • Do you have flexible data management policies at the share, directory and file levels?
  • Can you access tiered files without additional licensing?
  • Can you migrate data or move to another system without rehydrating everything you’ve tiered?
  • Do you tier files or blocks?

netapp-bluexp-tiering

Learn more about Komprise Data Management for NetApp.

Webinar: NetApp + Komprise – Right Data, Right Place, Right Time

Watch a demo of Komprise Storage Insights.

Getting Started with Komprise:

Want To Learn More?

Bucket Sprawl

Bucket sprawl refers to the problem of having a large number of data storage buckets, also known as an object storage bucket, often in cloud data storage environments, that are created and left unused or forgotten over time. This can happen when individuals or teams create buckets for specific projects or tasks, but fail to properly manage and delete them once they are no longer needed.

What is a Cloud Bucket?

A cloud bucket is a container for storing data objects in cloud storage services such as Amazon S3, Google Cloud Storage, or Microsoft Azure Storage. Cloud buckets can hold a variety of data types including images, videos, documents, and other files.

Cloud buckets are typically accessed and managed through an API or web-based interface provided by the cloud storage provider. They offer a scalable and cost-effective way to store and retrieve large amounts of data, and can be used for a variety of applications including backup and disaster recovery, content delivery, and web hosting.

Cloud buckets provide a number of benefits over traditional on-premises data storage solutions, including ease of use, cost-effectiveness, scalability, and availability. However, it is important to properly manage and secure cloud buckets to ensure that sensitive data is protected and costs are kept under control.

The Problem with Cloud Bucket Sprawl

Cloud bucket sprawl can lead to a number of issues, including increased data storage costs, decreased efficiency in accessing necessary data, and potential security risks if sensitive information is stored in forgotten or unsecured buckets. To avoid bucket sprawl, it is important to have a system in place for regularly reviewing and managing storage buckets, including identifying and deleting those that are no longer necessary.

Cloud Data Management for Bucket Sprawl

In the blog post: Making Smarter Moves in a Multicloud World, Komprise CEO and cofounder Kumar Goswami introduced Komprise cloud data management capabilities this way:

It gives customers a better way to manage their cloud data as it grows, (combat “bucket sprawl”), gives visibility into their cloud costs, and provides a simple way to manage data both on premises and in the cloud. Komprise now provides enterprises with actionable analytics to not only understand their cloud data costs but also optimize them with data lifecycle management.

Learn more about Komprise cloud data management.

Infographic: How to Maximize Cloud Cost Savings

cloud_cost_optimization-768x645

Getting Started with Komprise:

Want To Learn More?

Capacity Planning

Capacity planning is the estimation of space, hardware, software, and connection infrastructure resources that will be needed a period of time. In reference to the enterprise environment, there is a common concern over whether or not there will be enough resources in place to handle an increasing number of users or interactions. The purpose of capacity planning is to have enough resources available to meet the anticipated need, at the right time, without accumulating unused resources. The goal is to match the resource of availability to the forecasted need, in the most cost-efficient manner for maximum data storage cost savings.

True data capacity planning means being able to look into the future and estimate future IT needs and efficiently plan where data is stored and how it is managed based on the SLA of the data. Not only must you meet the future business needs of fast-growing unstructured data, you must also stay within the organization’s tight IT budgets. And, as organizations are looking to reduce operational costs with the cloud (see cloud cost optimization), deciding what data can migrate to the cloud, and how to leverage the cloud without disrupting existing file-based users and applications becomes critical.

Data storage never shrinks, it just relentlessly gets bigger. Regardless of industry, organization size, or “software-defined” ecosystem, it is a constant stress-inducing challenge to stay ahead of the storage consumption rate. That challenge is not made any easier considering that typically organizations waste a staggering amount of data storage capacity, much of which can be attributed to improper capacity management.

Are you making capacity planning decisions without insight?

Komprise enables you to intelligently plan storage capacity, offset additional purchase of expensive storage, and extend the life of your existing data storage by providing visibility across your storage with key analytics on how data is growing and being used, and interactive what-if analysis on the ROI of using different data management objectives. Komprise moves data based on your objectives to secondary storage, object storage or cloud storage, of your choice while providing a file gateway for users and applications to transparently access the data exactly as before.

BizValue_blog_linkedin-768x402

With an analytics-first approach, Komprise provides visibility into how data is growing and being used across storage silos. Storage administrators and IT leaders no longer have to make storage capacity planning decisions without insight. With Komprise Intelligent Data Management, you’ll understand how much more storage will be needed, when and how to streamline purchases during planning.

costsavings_fig4-1

Getting Started with Komprise:

Want To Learn More?

Chargeback

What is Chargeback?

Chargeback is a cost allocation strategy used by enterprise IT organizations to charge business units or departments for the IT resources / services they consume. This strategy allows organizations to assign costs to the departments that are responsible for them, which can help to improve accountability, cost management and cost optimization.

Under a chargeback model, IT resources such as hardware, software, and services are assigned a cost and allocated to the business units or departments that use them. The costs may be based on factors such as usage, capacity, or complexity. The business units or departments are then billed for the IT resources they consume based on these costs.

The chargeback model can provide several benefits for organizations. It can help to promote transparency and accountability, as departments are charged for the IT resources they use. This can help to encourage departments to use IT resources more efficiently and reduce overall costs. Chargeback can also help to align IT spending with business goals, as departments are more likely to prioritize spending on IT resources that directly support their business objectives.

Implementing an IT chargeback model requires careful planning and communication to ensure that it is implemented effectively. It is important to establish clear policies and guidelines for how IT resources are assigned costs and billed to business units or departments, and to provide regular reporting and analysis to help departments understand their IT costs and usage.

Showback and Storage as a Service

Departmental-Archiving-WP-THUMB-2-768x512Many enterprise have adopted a Storage-as-aService (STaaS) approach to centralize IT’s efforts for each department. But convincing department heads to care about storage savings is a tough task without the right tools. Storage-agnostic data management, tiering and archiving are viewed by users as an extraneous hassle and potential disruption that fails to answer “What’s in it for me?”

This white paper explains how to make STaaS successful by telling a compelling data story department heads can’t ignore. This coupled with transparent data tiering techniques that do not change the user experience are critical to successful systematic archiving and significant savings.

Learn how using analytics-driven showback can help secure the buy-in needed to archive more data more often. Once they understand their data—how much is cold and how much they could be saving—the conversation quickly changes.

Read the blog post: How Storage Teams Use Deep Analytics.

Getting Started with Komprise:

Want To Learn More?

Cloud Cost Optimization

Cloud cost optimization is a process to reduce operating costs in the cloud while maintaining or improving the quality of cloud services. It involves identifying and addressing areas to reduce the use of cloud resources, select more cost-effective cloud services, or deploy better management practices, including data management.

The cloud is highly flexible and scalable, but it also involves ongoing and sometimes hidden costs, including usage fees, egress fees, storage costs, and network fees. If not managed properly, these costs can quickly become a significant burden for organizations.

In one of our 2023 data management predictions posts, we noted:

Managing the cost and complexity of cloud infrastructure will be Job No. 1 for enterprise IT in 2023. Cloud spending will continue, although at perhaps a more measured pace during uncertain economic times. What will be paramount is to have the best data possible on cloud assets to make sound decisions on where to move data and how to manage it for cost efficiency, performance, and analytics projects. Data insights will also be important for migration planning, spend management (FinOps), and to meet governance requirements for unstructured data management. These are the trends we’re tracking for cloud data management, which will give IT directors precise guidance to maximize data value and minimize cloud waste.

Source: ITPro-Today

Steps to Optimize Cloud Costs

To optimize cloud costs, organizations can take several steps, including:

  • Right-sizing: Choose the correct size and configuration of cloud resources to meet the needs of the application, avoiding overprovisioning or underprovisioning.
  • Resource utilization: Monitor the use of cloud resources to reduce waste and improve cost efficiency.
  • Cost allocation: Implement cost allocation and tracking practices to better understand cloud costs and improve accountability.
  • Reserved instances: Use reserved instances to reduce costs by committing to a certain level of usage for a longer term.
  • Cost optimization tools: These tools identify areas for savings and help manage cloud expenses.

The Challenge of Managing Cloud Data

Managing cloud data costs takes significant manual effort, multiple tools, and constant monitoring. As a result, companies are using less than 20% of the cloud cost-saving options available to them. “Bucket sprawl” makes matter worse, as users easily create accounts and buckets and fill them with data—some of which is never accessed again.

When trying to optimize cloud data, cloud administrators contend with poor visibility and complexity of data management:

  • How can you know your cloud data?
  • How fast is cloud data growing and who’s using it?
  • How much is active vs. how much is cold?
  • How can you dig deeper to optimize across object sizes and storage classes?

How can you make managing data and costs manageable?

  • It’s hard to decipher complicated cost structures.
  • Need more information to manage data better, e.g., when was an object last accessed?
  • Factoring in multiple billable dimensions and costs is extremely complex: storage, access, retrievals, API,
    transitions, initial transfer, and minimal storage-time costs.
  • There are unexpected costs of moving data across different storage classes (e.g., Amazon S3 Standard to S3
    Glacier). If access isn’t continually monitored, and data is not moved back up when it gets hot, you will face
    expensive retrieval fees

These issues are further compounded as enterprises move toward a multicloud approach and require a single set
of tools, policies, and workflow to optimize and manage data residing within and across clouds.

Komprise_Cloud_Data_ManagementKomprise Cloud Data Management

Reduce cloud storage costs by more than 50% with Komprise.

Cloud providers offer a range of storage services. Generally, there are storage classes with higher performance
and costs for hot and warm data, such as Amazon S3 Standard and S3 Standard-IA, and there are storage classes
with much lower performance and costs that are appropriate for cold data, such as S3 Glacier and S3 Glacier Deep
Archive. Data access fees and retrieval fees for the lower cost storage classes are much higher than that of the
higher performance and higher cost storage classes. To maximize savings, you need an automated unstructured data management solution that takes into account data access patterns to dynamically and cost optimally move data across storage classes (e.g., Amazon S3 Standard to S3 Standard-IA or S3 Standard-IA to S3 Glacier) and across multi-vendor storage services (e.g., NetApp Cloud Volumes ONTAP to Amazon S3 Standard to S3 Standard-IA to S3 Glacier to S3 Glacier Deep Archive). While some limited manual data movement through Object Lifecycle Management policies based on modified times
or intelligent tiering is available from the cloud providers, these approaches offer limited savings and involve hidden
costs.

Komprise automates full lifecycle management across multi-vendor cloud storage classes using intelligence from data
usage patterns to maximize your savings without heavy lifting. Read the white paper to see how you can save +50% on cloud storage cost savings.

Watch the video: How to save costs and manage your multi-cloud sorry

Getting Started with Komprise:

Want To Learn More?

Cloud Costs

Cloud costs, or cloud computing costs, will vary based on cloud service provider, the specific cloud services and cloud resources used, usage patterns, and pricing models. See Cloud Cost Optimization.

Gartner forecast that cloud spend will be nearly $600B in 2023 and in an increasingly hybrid enterprise IT infrastructure, cloud repatriation is making headlines: cloud repatriation and the death of cloud only.

Why are my cloud costs so high?

cloud_cost_optimizationA number of factors can influence your cloud costs. Examples include?

  • Compute Resources: Cloud providers offer various compute options, such as virtual machines (VMs), containers, or serverless functions. The cost of compute resources depends on factors like the instance type, CPU and memory specifications, duration of usage, and the pricing model (e.g., on-demand, reserved instances, or spot instances).
  • Cloud Storage: Cloud storage costs can vary based on the type of storage used, such as object storage, block storage, or file storage. The factors affecting storage costs include the amount of data stored, data transfer in and out of the storage, storage duration, and any additional features like data replication or redundancy. See the white paper: Block-level versus file-level tiering.
  • Networking: Cloud providers charge for network egress and data transfer between different regions, availability zones, or across cloud services. The cloud cost can depend on the volume of data transferred, the distance between data centers, and the bandwidth used.
  • Database Services: Cloud databases, such as relational databases (RDS), NoSQL databases (DynamoDB, Firestore), or managed database services, have their own pricing models. The cost can be based on factors like database size, read/write operations, storage capacity, and backup and replication requirements.
  • Data Transfer and CDN: Cloud providers typically charge for data transfer between their services and the internet, as well as for content delivery network (CDN) services that accelerate content delivery. Costs can vary based on data volume, data center locations, and regional traffic patterns.
  • Cloud Services: Cloud providers offer a range of additional cloud services, such as analytics, AI/ML, monitoring, logging, security, and management tools. The cost of these services is usually based on usage, the number of requests, data processed, or specific feature tiers.
  • Pricing Models: Cloud providers offer different pricing models, including on-demand (pay-as-you-go), reserved instances (pre-purchased capacity for longer-term usage), spot instances (bid-based pricing for unused capacity), or savings plans (commitments for discounted rates). Choosing the appropriate pricing model can impact overall cloud costs.

To estimate and manage cloud costs effectively, enterprise IT, engineering and all consumers of cloud services need to monitor resource usage, optimize resource allocation, leverage cost management tools provided by the cloud provider and independent solution providers, and regularly review and adjust resource utilization based on actual requirements. Each cloud provider has detailed pricing documentation and cost calculators on their websites that can help estimate costs based on specific usage patterns and service selections. In an increasingly hybrid, multi-cloud environment, looking to technologies that can analyze and manage cloud costs independent from cloud service providers is gaining popularity.

Cloud-Cost-Optimization-Relies-on-Data-Management_-Linkedin-Social-1200px-x-628px-Color-V1

Getting Started with Komprise:

Want To Learn More?

Cloud Data Management

CloudDataManagement_Diagram-scaled

What is Cloud Data Management?

Cloud data management is a way to manage data across cloud platforms, either with or instead of on-premises storage. A popular form of data storage management, the goal is to curb rising cloud data storage costs, but it can be quite a complicated pursuit, which is why most businesses employ an external company offering cloud data management services with the primary goal being cloud cost optimization.

Cloud data management is emerging as an alternative to data management using traditional on-premises software. The benefit of employing a top cloud data management company means that instead of buying on-premises data storage resources and managing them, resources are bought on-demand in the cloud. This cloud data management services model for cloud data storage allows organizations to receive dedicated data management resources on an as-needed basis. Cloud data management also involves finding the right data from on-premises storage and moving this data through data archiving, data tiering, data replication and data protection, or data migration to the cloud.

Advantages of Cloud Data Management

How to manage cloud storage? According to two 2023 surveys (here and here), 94% of respondents say they’re wasting money in the cloud, 69% say that data storage accounts for over one quarter of their company’s cloud costs and 94% said that cloud storage costs are rising. Optimal unstructured data management in the cloud provides four key capabilities that help with managing cloud storage and reduce your cloud data storage costs:

  1. Gain Accurate Visibility Across Cloud Accounts into Actual Usage
  2. Forecast Savings and Plan Data Management Strategies for Cloud Cost Optimization
  3. Cloud Tiering and Archiving Based on Actual Data Usage to Avoid Surprises
    • For example, using last-accessed time vs. last modified provides a more predictable decision on the objects that will be accessed in the future, which avoids costly archiving errors.
  4. Radically Simplify Cloud Migrations
    • Easily pick your source and destination
    • Run dozens or hundreds of migrations in parallel
    • Reduce the babysitting

Komprise-Hypertransfer-Migration-White-Paper-SOCIAL-2

The many benefits of cloud data management services include speeding up technology deployment and reducing system maintenance costs; it can also provide increased flexibility to help meet changing business requirements.

Challenges Faced with Enterprise Cloud Data Management

But, like other cloud computing technologies, enterprise cloud data management services can introduce challenges – for example, data security concerns related to sending sensitive business data outside the corporate firewall for storage. Another challenge is the disruption to existing users and applications who may be using file-based applications on premise since the cloud is predominantly object based.

Cloud data management service solutions should provide you with options to eliminate this disruption by transparently moving and managing data across common formats such as file and object.

Komprise Intelligent Data Management

Features of a Cloud Data Management Services Platform

Some common features and capabilities cloud data management solutions should deliver:

  • Data Analytics: Can you get a view of all your cloud data, how it’s being used, and how much it’s costing you? Can you get visibility into on-premises data that you wish to migrate to the cloud? Can you understand where your costs are so you know what to do about them?
  • Planning and Forecasting: Can you set policies for how data should get moved either from one cloud storage class to another or from an on-premises storage to the cloud. Can you project your savings? Does this account for hidden fees like retrieval and egress costs?
  • Policy based data archiving, data replication, and data management: How much babysitting do you have to do to move and manage data? Do you have to tell the system every time something needs to be moved or does it have policy based intelligent automation?
  • Fast Reliable Cloud Data Migration: Does the system support migrating on-premises data to the cloud? Does it handle going over a Wide Area Network? Does it handle your permissions and access controls and preserve security of data both while it’s moving the data and in the cloud?
  • Intelligent Cloud Archiving, Intelligent Tiering and Data Lifecycle Management: Does the solution enable you to manage ongoing data lifecycle in the cloud? Does it support the different cloud storage classes (eg High-performance options like File and Cloud NAS and cost-efficient options like Amazon S3 and Glacier)?

In practice, the design and architecture of a cloud varies among cloud providers. Service Level Agreements (SLA) represent the contract which captures the agreed upon guarantees between a service provider and its customers.

It is important to consider that cloud administrators are responsible for factoring:

  • Multiple billable dimensions and costs: storage, access, retrievals, API, transitions, initial transfer, and minimal storage-time costs
  • Unexpected costs of moving data across different storage classes. Unless access is continually monitored and data is moved back up when it gets hot, you’ll face expensive retrieval fees.

This complexity is the reason why only a mere 20% of organizations are leveraging the cost-saving options available to them in the cloud.

How do Cloud Data Management Services Tools work?

As more enterprise data runs on public cloud infrastructure, many different types of tools and approaches to cloud data management have emerged. The initial focus has been on migrating and managing structured data in the cloud. Cloud data integration, ETL (extraction, transformation and loading), and iPaaS (integration platform as a service) tools are designed to move and manage enterprise applications and databases in the cloud. These tools typically move and manage bulk or batch data or real time data.

Cloud-based analytics and cloud data warehousing have emerged for analyzing and managing hybrid and multi-cloud structured and semi-structured data, such as Snowflake and Databricks.

In the world of unstructured data storage and backup technologies, cloud data management has been driven by the need for cost visibility, cost reduction, cloud cost optimization and optimizing cloud data. As file-level tiering has emerged as a critical component of an intelligent data management strategy and more file data is migrating to the cloud, cloud data management is evolving from cost management to automation and orchestration, governance and compliance, performance monitoring, and security. Even so, spend management continues to be a top priority for any enterprise IT organizing migrating application and data workloads to the cloud.

What are the challenges faced with Cloud Data Management security?

Most of the cloud data management security concerns are related to general cloud computing security questions organizations face. It’s important to evaluate the strengths and security certifications of your cloud data management vendor as part of your overall cloud strategy

Is adoption of Cloud Data Management services growing?

As enterprise IT organizations are increasingly running hybrid, multi-cloud, and edge computing infrastructure, cloud data management services have emerged as a critical requirement. Look for solutions that are open, cross-platform, and ensure you always have native access to your data. Visibility across silos has become a critical need in the enterprise, but it’s equally important to ensure data does not get locked into a proprietary solution that will disrupt users, applications, and customers. The need for cloud native data access and data mobility should not be underestimated. In addition to visibility and access, cloud data management services must enable organizations to take the right action in order to move data to the right place and the right time. The right cloud data management solution will reduce storage, backup and cloud costs as well as ensure a maximum return on the potential value from all enterprise data.

How is Enterprise Cloud Data Management Different from Consumer Systems?

While consumers need to manage cloud storage, it is usually a matter of capacity across personal storage and devices. Enterprise cloud data management involves IT organizations working closely with departments to build strategies and plans that will ensure unstructured data growth is managed and data is accessible and available to the right people at the right time.

Enterprise IT organizations are increasingly adopting cloud data management solutions to understand how cloud (typically multi-cloud) data is growing and manage its lifecycle efficiently across all of their cloud file and object storage options.

Analyzing and Managing Cloud Storage with Komprise

  • Get accurate analytics across clouds with a single view across all your users’ cloud accounts and buckets and save on storage costs with an analytics-driven approach.
  • Forecast cloud cost optimization by setting different data lifecycle policies based on your own cloud costs.
  • Establish policy-based multi-cloud lifecycle management by continuously moving objects by policy across storage classes transparently (e.g., Amazon Standard, Standard-IA, Glacier, Glacier Deep Archive).
  • Accelerate cloud data migrations with fast, efficient data migrations across clouds (e.g., AWS, Azure, Google and Wasabi) and even on-premises (ECS, IBM COS, Pure FlashBlade).
  • Deliver powerful cloud-to-cloud data replication by running, monitoring, and managing hundreds of migrations faster than ever at a fraction of the cost with Elastic Data Migration.
  • Keep your users happy with no retrieval fee surprises and no disruption to users and applications from making poor data movement decisions based on when the data was created.

A cloud data management platform like Komprise, named a Gartner Peer Insights Awards leader, that is analytics-driven, can help you save 50% or more on your cloud storage costs.

Komprise_Cloud_Data_Management-768x407

Learn more about your options for migrating file workloads to the cloud: The Easy, Fast, No Lock-In Path to the Cloud.

What is Cloud Data Management?

Cloud Data Management is a way to analyze, manage, secure, monitor and move data across public clouds. It works either with, or instead of on-premises applications, databases, and data storage and typically offers a run-anywhere platform.

Cloud Data Management Services

Cloud data management is typically overseen by a vendor that specializes in data integration, database, data warehouse or data storage technologies. Ideally the cloud data management solution is data agnostic, meaning it is independent from the data sources and targets it is monitoring, managing and moving. Benefits of an enterprise cloud data management solution include ensuring security, large savings, backup and disaster recovery, data quality, automated updates and a strategic approach to analyzing, managing and migrating data.

Cloud Data Management platform

Cloud data management platforms are cloud based hubs that analyze and offer visibility and insights into an enterprises data, whether the data is structured, semi-structured or unstructured.

Getting Started with Komprise:

Want To Learn More?

Cloud Data Migration

What is Cloud Data Migration?

Cloud data migration is the process of relocating either all or a part of an enterprise’s data to a cloud infrastructure. Cloud data migration is often the most difficult and time-consuming part of an overall cloud migration project. Other elements of cloud migration involve application migration and workflow migration. A “smart data migration” to the cloud strategy for enterprise file data means an analytics-first approach ensuring you know which data can migrate, to which class and tier, and which data should stay on-premises in your hybrid cloud storage infrastructure. Komprise Elastic Data Migration makes cloud data migrations simple, fast and reliable with continuous data visibility and optimization.

The Komprise Smart Data Migration Strategy

KompriseDataArchive-1-1536x668
Learn more about Komprise Smart Data Migration for file and object data.

Read the blog post: Smart Data Migration for File and Object Data Workloads

Cost, Complexity and Time:
Why Cloud Data Migrations are Difficult

Cloud data migrations are usually the most laborious and time-consuming part of a cloud migration initiative. Why? Data is heavy – data footprints are often in hundreds of terabytes to petabytes and can involve billions of files and objects. Some key reasons why cloud data migrations fail include:

  • Lack of Proper Planning: Often cloud data migrations are done in an ad-hoc fashion without proper analytics on the data set and planning
  • Improper Choice of Cloud Storage Destination: Most public clouds offer many different classes and tiers of storage – each with their own costs and performance metrics. Also, many of the cloud storage classes have retrieval and egress costs, so picking the right cloud storage class for a data migration involves not just finding the right performance and price to store the data but also the right access costs. Intelligent tiering and Intelligent archiving techniques that span both cloud file and object storage classes are important to ensure the right data is in the right place at the right time.
  • Ensuring Data Integrity: Data migrations involve migrating the data along with migrating metadata. For a cloud data migration to succeed, not only should all the data be moved over with full fidelity, but all the access controls, permissions, and metadata should also move over. Often, this is not just about moving data but mapping these from one storage environment to another.
  • Downtime Impact: Cloud data migrations can often take weeks to months to complete. Clearly, you don’t want users to not be able to access the data the need for this entire time. Minimizing downtime, even during a cutover, is very important to reduce productivity impact.
  • Slow Networks, Failures: Often cloud data migrations are done over a Wide Area Network (WAN), which can have other data moving on it and hence deliver intermittent performance. Plus, there may be times when the network is down or the storage at either end is unavailable. Handling all these edge conditions is extremely important – you don’t want to be halfway through a month-long cloud data migration only to encounter a network failure and have to start all over again.
  • Time Consuming – Since cloud data migrations involve moving large amounts of data, they can often involve a lot of manual effort in managing the migrations. This is laborious, tedious and time consuming.
  • Sunk Costs: Cloud data migrations are often time-bound projects – once the data is migrated, the project is complete. So, if you invest in tools to address cloud data migrations, you may have sunk costs once the cloud data migration is complete.

Komprise-Hypertransfer-Migration-PR-SOCIAL-768x402

Cloud Data Migrations can be of Network Attached Storage (NAS) or File Data, or of Object data or of Block data. Of these, Cloud Data Migration of File Data and Cloud Data Migration of Object data are particularly difficult and time-consuming because file and object data are much larger in volume.

  • To learn more about the seven reasons why cloud data migrations are dreaded, watch the webinar.
  • Learn more about why Komprise is the fast, no lock-in approach to unstructured cloud data migrations: Path to the cloud.

Cloud Data Migration Strategies

Different cloud data migration strategies are used depending on whether file data or object data need to be migrated. Common methods for moving these two types of data through cloud migration solutions are described in further detail below.

Cloud Data Migration for File Data aka NAS Cloud Data Migrations

Migration-two-tone-icon-2@2x-300x265

File data is often stored on Network Attached Storage. File data is typically accessed over NFS and SMB protocols. File data can be particularly difficult to migrate because of its size, volume, and richness. File data often involves a mix of large and small files – data migration techniques often do better when migrating large files but fail when migrating small files. Data migration solutions need to address a mix of large and small files and handle both efficiently. File data is also voluminous – often involving billions of files. Reliable cloud data migration solutions for file data need to be able to handle such large volumes of data efficiently. File data is also very rich and has metadata, access control permissions and hierarchies. A good file data migration solution should preserve all the metadata, access controls and directory structures. Often, migrating file data involves mapping this information from one file storage format to another. Sometimes, file data may need to be migrated to an object store. In these situations, the file metadata needs to be preserved in the object store so the data can be restored as files at a later date. Techniques such as MD5 checksums are important to ensure the data integrity of file data migrations to the cloud.

Cloud Data Migration for Object Data (S3 Data Migrations or Object-to-Cloud Data Migrations or Cloud-to-Cloud Data Migrations)

Cloud data migrations of object data is relatively new but quickly gaining momentum as the majority of enterprises are moving to a multi-cloud architecture. The Amazon Simple Storage Service (S3) protocol has become a de-facto standard for object stores and public cloud providers. So most cloud data migrations of object data involve S3 based data migrations.

3 common use cases for cloud object data migrations:
  • Data migrations from an on-premises object store to the public cloud: Many enterprises have adopted an on-premises object storage Most of these object storage solutions follow the S3 protocol. Customers are now looking to analyze data on their on-premises object storage and migrate some or all of that data to a public cloud storage option such as Amazon S3 or Microsoft Azure Blob.
  • Cloud-to-cloud data migrations and cloud-to-cloud data replications: Enterprises looking to switch public cloud providers need to migrate data from one cloud to another. Sometimes, it may also be cost-effective to replicate across clouds as opposed to replicating within a cloud. This also improves data resiliency and provides enterprises with a multi-cloud strategy. Cloud-to-cloud data replication differs from cloud data migration because it is ongoing – as data changes on one cloud, it is copied or replicated to the second cloud.
  • S3 data migrations: This is a generic term that refers to any object or cloud data migration done using the S3 protocol. The Amazon Simple Storage Service (s3) protocol has become a de-facto standard. Any Object-to-Cloud, Cloud-to-Cloud or Cloud-to-Object migration can typically be classified as a S3 Data Migration.

five-industry-data-migration-use-cases-blog-SOCIAL-1-768x402

Secure Cloud Data Migration Tools

Cloud data migrations can be performed by using free tools that require extensive manual involvement or commercial data migration solutions. Sometimes Cloud Storage Gateways are used to move data to the cloud, but these require heavy hardware and infrastructure setup. Cloud data management solutions offer a streamlined, cost-effective, software-based approach to manage cloud data migrations without requiring expensive hardware infrastructure and without creating data lock-in. Look for elastic data migration solutions that can dynamically scale to handle data migration workloads and adjust to your demands.

7 Tips for a Clean Cloud Data Migration:
  1. Define Sources and Targets
  2. Know the Rules & Regulations
  3. Proper Data Discovery
  4. Define Your Path
  5. Test, Test, Test
  6. Free Tools vs. Enterprise
  7. Establish a Communication Plan

Watch the webinar: Preparing for a Cloud File Data Migration

What is a Smart Data Migration?

Know your cloud data migration choices for file and object data migration.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

Cloud Data Storage

Cloud data storage is a service for individuals or organizations to store data through a cloud computing provider such as AWS, Azure, Google Cloud, IBM or Wasabi. Storing data in a cloud service eliminates the need to purchase and maintain data storage infrastructure, since infrastructure resides within the data centers of the cloud IaaS provider and is owned/managed by the provider. Many organizations are increasing data storage investments in the cloud for a variety of purposes including: backup, data replication and data protection, data tiering and archiving, data lakes for artificial intelligence (AI) and business intelligence (BI) projects, and to reduce their physical data center footprint. As with on-premises storage, you have different levels of data storage available in the cloud. You can segment data based on access tiers: for instance, hot and cold data storage.

komprise_cloud_intelligent

Types of Cloud Data Storage

Cloud data storage can either be designed for personal data and collaboration or for enterprise data storage in the cloud. Examples of personal data cloud storage are Google Drive, Box and DropBox.

Increasingly, corporate data storage in the cloud is gaining prominence – particularly around taking enterprise file data that was traditionally stored on Network Attached Storage (NAS) and moving that to the cloud.

Cloud file storage and object storage are gaining adoption as they can store petabytes of unstructured data for enterprises cost-effectively.

Enterprise Cloud Data Storage for Unstructured Data

(Cloud File Data Storage and Cloud Object Data Storage)

Enterprise unstructured data growth is exploding – whether its genomics data, video and media content, or log files or IoT data.  Unstructured data can be stored as files on file data storage or as objects on cost-efficient object storage. Cloud storage providers are now offering a variety of file and object storage classes at different price points to accommodate unstructured data. Amazon EFS, FSX, Azure Files are examples of cloud data storage for enterprise file data, and Amazon S3, Azure Blob and Amazon Glacier are examples of object storage.

Advantages of Cloud Data Storage

There are many benefits of investing in cloud data storage, particularly for unstructured data in the enterprise. Organizations gain access to unlimited resources, so they can scale data volumes as needed and decommission instances at the end of a project or when data is deleted or moved to another storage resource. Enterprise IT teams can also reduce dependence on hardware and have a more predictable storage budget. However, without proper cloud data management, cloud egress costs and other cloud costs are often cited as challenges.

In summary, cloud data storage allows:
  • The opportunity to reduce capital expenses (CAPEX) of data center hardware along with savings in energy, facility space and staff hours spend maintaining and installing hardware.
  • Deliver vastly improved agility and scalability to support rapidly changing business needs and initiatives.
  • Develop an enterprise-wide data lake strategy that would otherwise be unaffordable.
  • Lower risks from storing important data on aging physical hardware.
  • Leverage cheaper cloud storage for archiving and tiering purposes, which can also reduce backup costs.
Challenges and Considerations
  • Cloud data storage can be costly if you need to frequently access the data for use outside of the cloud, due to egress fees charged by cloud storage providers.
  • Using cloud tiering methodologies from on-premises storage vendors may result in unexpected costs, due to the need for restoring data back to the storage appliance prior to use. Read the white paper Cloud Tiering: Storage-Based vs. Gateways vs. File-Based
  • Moving data between clouds is often difficult, because of data translation and data mobility issues with file objects. Each cloud provider uses different standards and formats for data storage.
  • Security can be a concern, especially in some highly regulated sectors such as healthcare, financial services and e-commerce. IT organizations will need to fully understand the risks and methods of storing and protecting data in the cloud.
  • The cloud creates another data silo for enterprise IT. When adding cloud storage to an organization’s storage ecosystem, IT will need to determine how to attain a central, holistic view of all storage and data assets.

For these reasons, cloud optimization and cloud data management are essential components of an enterprise cloud data storage and overall data storage cost savings strategy. Komprise has strategic alliance partnerships with hybrid and cloud data storage technology leaders:

Learn more about your options for migrating file workloads to the cloud: The Easy, Fast, No Lock-In Path to the Cloud.

Getting Started with Komprise:

Want To Learn More?

Cloud File Storage

What is Cloud File Storage?

Cloud File Storage, also known as Cloud NAS isCloud-Migration-3@3x-400x400 a method for storing data in the cloud that provides servers and applications access to data through file system protocols such as NFS and SMB. Cloud file storage allows customers to move file-based workloads to the cloud without code changes.

Popular choices for cloud file storage are AWS FSx for Windows, AWS FSx ONTAP, AWS FSx ZFS, Microsoft Azure Files, Google Filestore, and Qumulo.

In late 2021, Komprise COO Krishna Subramanian predicted that cloud file storage will accelerate.

She wrote:

First, it was cloud-native applications, then block workloads, but now it’s time for file workloads to move to the cloud. Explosive growth in unstructured file data has led to data centers bursting at the seams. Covid-19 has accelerated the shift to cloud for file workloads.

Data management solutions are also enabling smart file migrations so that hot data is placed in cloud file storage and cold data is transparently and efficiently tiered at the file level to object storage. This means that customers can use data from both the file and object tiers. Another approach many vendors are taking is to provide cloud-like economics and pricing while the infrastructure remains on-premises — HPE Greenlake and Pure as a Service are examples of this trend.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

Cloud Migration

CloudMigrationDiagram.png

Cloud migration refers to the movement of data, processes, and applications from on-premises data storage or legacy infrastructure to cloud-based infrastructure for storage, application processing, data archiving and ongoing data lifecycle management. Komprise offers an analytics-driven cloud migration software solution – Elastic Data Migration – that integrate with most leading cloud service providers, such as AWS, Microsoft Azure, Google Cloud, Wasabi, IBM Cloud and more.

Benefits of Cloud Migration

Migrating to the cloud can offer many advantages – lower operational costs, greater elasticity, and flexibility. Migrating data to the cloud in a native format also ensures you can leverage the computational capabilities of the cloud and not just use it as a cheap storage tier. When migrating to the cloud, you need to consider both the application as well as its data. While application footprints are generally small and relatively easier to migrate, cloud file data migrations need careful planning and execution as data footprints can be large. Cloud migration of file data workloads with Komprise allows you to:

  • Plan a data migration strategy using analytics before migration. A pre-migration analysis helps you identify which files need to be migrated, plan how to organize the data to maximize the efficiency of the migration process. It’s important to know how data is used and to determine how large and how old files are throughout the storage system. Since data footprints often reach billions of files, planning a migration is critical.
  • Improve scalability with Elastic Data Migration. Data migrations can be time consuming as they involve moving hundreds of terabytes to  petabytes of data.  Since storage that data is migrating from is usually still in use during the migration, the data migration solution needs to move data as fast as possible without slowing down user access to the source storage.  This requires a scalable architecture that can leverage the inherent parallelism of the data sets to migrate multiple data streams in parallel without overburdening any single source storage. Komprise uses a patented elastic data migration architecture that maximizes parallelism while throttling back as needed to preserve source data storage performance.
  • Shrink cloud migration time. When compared to generic tools used across heterogeneous cloud and physical storage, Komprise cloud data migration is nearly 30x faster. Performance is maximized at every level with the auto parallelize feature, minimizing network usage and making migration over WAN more efficient.

Komprise-Hypertransfer-Migration-BLOG-SOCIAL-final-768x402

  • Reduce ongoing cloud data storage costs with smart migration, intelligent tiering and data lifecycle management in the cloud. Migrating to the cloud can reduce the amount spent on IT needs, storage maintenance, and hardware upgrades as these are typically handled by the cloud provider. Most clouds provide multiple storage classes at different price points – Komprise intelligently moves data to the right storage class in the cloud based on your policy and performs ongoing data lifecycle management in the cloud to reduce storage cost.  For example, for AWS, unlike cloud intelligent tiering classes, Komprise tiers across both S3 and Glacier storage classes so you get the best cost savings.
  • Simplify storage management. With a Komprise cloud migration, you can use a single solution across your multivendor storage and multicloud architectures. All you have to do is connect via open standards – pick the SMB, NFS, and S3 sources along with the appropriate destinations and Komprise handles the rest. You also get a dashboard to monitor and manage all of your migrations from one place. No more sunk costs of point migration tools because Komprise provides ongoing data lifecycle management beyond the data migration.
  • Greater resource availability. Moving your data to the cloud allows it to be accessed from wherever users may be, making your it easier for international businesses to store and access their data from around the world. Komprise delivers native data access so you can directly access objects and files in the cloud without getting locked in to your NAS vendor—or even to Komprise.

Cloud Migration Process

The cloud data migration process can differ widely based on a company’s storage needs, business model, environment of current storage, and goals for the new cloud-based system. Below are the main steps involved in migrating to the cloud.

Step 1 – Analyze Current Storage Environment and Create Migration Strategy

A smooth migration to the cloud requires proper planning to ensure that all bases are covered before the migration begins. It’s important to understand why the move is beneficial and how to get the most out of the new cloud-based features before the process continues.

Step 2 – Choose Your Cloud Deployment Environment

After taking a thorough look at the current resource requirements across your storage system, you can choose who will be your cloud storage provider(s). At this stage, it’s decided which type of hardware the system will use, whether it’s used in a single or multi-cloud solution, and if the cloud solution will be public or private.

Step 3 – Migrate Data and Applications to the Cloud

Application workload migration to the cloud can be done through generic tools.  However, since data migration involves moving petabytes of data and billions of files, you need a data management software solution that can migrate data efficiently in a number of ways including through a public internet connection, a private internet connection, (LAN or a WAN), etc.

Step 4 – Validate Data After Migration

Once the migration is complete, the data within the cloud can be validated and production access to the storage system can be swapped from on-premises to the cloud.  Data validation often requires MD5 checksum on every file to ensure the integrity of the data is intact after migration.

Komprise Cloud Data Migration

With Elastic Data Migration from Komprise, you can affordably run and manage hundreds of migrations across many different platforms simultaneously. Gain access to a full suite of high-speed cloud migration tools from a single dashboard that takes on the heavy lifting of migrations, and moves your data nearly 30x faster than traditional available services—all without any access disruption to users or apps.

Our team of cloud migration professionals with over two decades of experience developing efficient IT solutions have helped businesses around the world provide faster and smoother data migrations with total confidence and none of the headaches. Contact us to learn more about our cloud data migration solution or sign up for a free trial to see the benefits beyond data migration with our analytics-driven Intelligent Data Management solution.

Learn more about your options for migrating file workloads to the cloud: The Easy, Fast, No Lock-In Path to the Cloud.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

Cloud NAS

optimize-data3x-300x288

What is Cloud NAS?

Cloud NAS is a relatively new term – it refers to a cloud-based storage solution to store and manage files. Cloud NAS or cloud file storage is gaining prominence and several vendors have now released cloud NAS offerings.

What is NAS?

Network Attached Storage (NAS) refers to data storage that can be accessed from different devices over a network. NAS environments have gained prominence for file-based workloads because they provide a hierarchical structure of directories and folders that makes it easier to organize and find files. Many enterprise applications today are file-based, and use files stored in a NAS as their data repositories.

Access Protocols

Cloud NAS storage is accessed via the Server Message Block (SMB) and Network File System (NFS) protocols. On-premises NAS environments are also accessed via SMB and NFS.

Why is Cloud NAS gaining in importance?

While the cloud was initially used by DevOps teams for new cloud-native applications that were largely object-based, the cloud is now seen as a major destination for core enterprise applications. These enterprise workloads are largely file-based, and so moving them to the cloud without rewriting the application means file-based workloads need to be able to run in the cloud.

To address this need, both cloud vendors and third-party storage providers are now creating cloud-based NAS offerings. Here are some examples of cloud NAS offerings:

Cloud NAS Tiers

Cloud NAS storage is often designed for high-performance file workloads and its high performance Flash tier can be very expensive.

Many Cloud NAS offerings such as AWS EFS and NetApp CloudVolumes ONTAP do offer some less expensive file tiers – but putting data in these lower tiers requires some data management solution. As an example, the standard tier of AWS EFS is 10 times more expensive than the standard tier of AWS S3. Furthermore, when you use a Cloud NAS, you may also have to replicate and backup the data, which can often make it three times more expensive. As this data becomes inactive and cold data, it is very important to manage data lifecycle on Cloud NAS to ensure you are only paying for what you use and not for dormant cold data on expensive tiers.

Intelligent Data Archiving and Intelligent Data Tiering for Cloud NAS

An analytics-driven unstructured data management solution can help you get the right data onto your cloud NAS and keep your cloud NAS costs low by managing the data lifecycle with intelligent archiving and intelligent tiering.

As an example, Komprise Intelligent Data Management for multi-cloud does the following:

  • Analyzes your on-premises NAS data so you can pick the data sets you want to migrate to the cloud
  • Migrates on-premises NAS data to your cloud NAS with speed, reliability and efficiency
  • Analyzes data on your cloud NAS to show you how data is getting cold and inactive
  • Enables policy-based automation so you can decide when data should be archived and tiered from expensive Cloud NAS tiers to lower cost file or object classes
  • Monitors ongoing costs to ensure you avoid expensive retrieval fees when cold data becomes hot again
  • Eliminates expensive backup and DR costs of cold data on cloud NAS

Cloud NAS Migration

Komprise-Hypertransfer-Migration-PR-SOCIAL-768x402

There are man potential advantages to migrated your NAS device to the cloud. But the right approach to cloud data migration is essential. Some of the common cloud NAS migration challenges are outlined in this post: Eliminating the Roadblocks of Cloud Data Migrations for File and NAS Data. Avoid unstructured data migration challenges and pitfalls with an analytics-first approach to cloud data migration and unstructured data management. With Komprise Elastic Data Migration you will:

  • Know before you migrate – analytics drive the most cost-effective plans
  • Preserve data integrity – maintain metadata, run MD5 checksums
  • Save time and costs – multi-level parallelism provides elastic scaling
  • Be worry-free – built for petabyte-scale that ensures reliability
  • Migrate NFS 27X faster and Migrate SMB data 25X faster – forget slow, free tools that need babysitting

Get the fast, no lock-in path to the cloud with a unified platform for unstructured data migration.

PttC_pagebanner-2048x639

———-

Getting Started with Komprise:

Want To Learn More?

Cloud Object Storage

What is Cloud Object Storage?

Cloud-storage-problem-blog-callout@3x-1536x1056Cloud object storage is a type of cloud data storage that is designed to store and manage large amounts of unstructured data in the cloud. Unlike file-based storage systems, cloud object storage services are based on a simple key-value model that allows data to be stored and retrieved based on unique identifiers (or keys) that are associated with each piece of data.

Also see Object Storage.

Cloud object storage is ideal for storing documents, images, videos, and other unstructured data types that doesn’t fit neatly into a structured (relational) database. Cloud object storage systems are designed to be highly scalable and can store large data sets, making them well-suited for big data applications and use cases such as backup and archiving, content distribution, and data analytics.

Examples of Cloud Object Storage

Some examples of cloud object storage include Amazon S3, Microsoft Azure Blob Storage, Google Cloud Storage, and IBM Cloud Object Storage Services. These cloud object storage services offer a range of features such as data durability and availability, built-in encryption, and flexible data access controls, as well as APIs and integrations for developers to easily incorporate object storage into their applications.

Komprise TMT: Cloud File and Object Duality

Komprise-Kumar-TMT-Deep-Dive-Blog-Part2-Social-768x402One of the core components of the Komprise Intelligent Data Management Platform is the patented Transparent Move Technology. When Komprise tiers files to a new target, typically object storage like AWS S3 or Azure Blob, moved files remain in native form, which means when a file becomes an object, a user sees it as a file. In addition to no end user disruption, preserving duality of file and object data across silos enables native cloud services on the data and ensures your data is not locked into a proprietary storage vendor format. This approach also ensures that hot data at the original source is handled by that storage vendor for optimal performance.

In an interview, CEO and co-founder Kumar Goswami put it this way:

Without using any agents, you can tier the data to the cloud and still access it from the original source as if it had never moved AND access it as a native object in the cloud to leverage cloud services like AI/ML cloud applications. This file to object duality, without agents, without getting in front of hot, mission-critical data is something no one else can tout.

Komprise partners with cloud object storage vendors to deliver data-storage agnostic unstructured data management as a service.

Getting Started with Komprise:

Want To Learn More?

Cloud Tiering

What is Cloud Tiering?

Cloud tiering definition: Cloud tiering is increasingly becoming a critical capability in managing enterprise file workloads across the hybrid cloud. Cloud tiering (also referred to as cloud archiving or archive to the cloud) are techniques that offload less frequently used data, also known as cold data, from expensive on-premises file storage or Network Attached Storage (NAS) to cheaper levels of storage in the cloud, typically object storage classes such as Amazon S3. Cloud tiering is a variant of data tiering. The term “data tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems.

Cloud Tiering Transparently Extends Enterprise File Storage to the Cloud

Enterprises today are increasingly trying to move core file workloads to the cloud. Since file data can be voluminous, involving billions of files, migrating file data to the cloud can take months and create disruption.

A simple solution to this is to gradually offload files to the cloud (cloud tiering) without changing the end user experience. Cloud tiering (or archiving to specific cloud tiers) enables this by moving infrequently used cold data to a cheaper cloud storage tier, while the data continues to remain accessible from the original location. This enables users to transparently extend on-premises capacity with the cloud.

Cloud Tiering Can Yield Significant Savings If Done Correctly

Cloud object storage is cost-efficient if used correctly. Most cloud providers charge not only for the storage, but also to retrieve data, and they charge egress fees if the data has to leave the cloud. Cloud retrieval fees are usually in the form of charges for “get” and “put” API calls and cloud egress costs are charged by the amount of data that is read from anywhere outside the cloud. So, to keep enterprise storage costs low, infrequently accessed data such as snapshots, logs, backups and cold data are best suited for tiering to the cloud.

By tiering cold data to the cloud, the on-premises storage array needs to only keep hot data and the most recent logs and snapshots. Across Komprise customers, we have found that typically 60% to 80% of their actual data has not been accessed in over a year. By cloud tiering the cold data as well as older log files and snapshots, the capacity of the storage array, mirrored storage array (if mirroring/replication is being used) and backup storage is reduced dramatically. This is why tiering cold data can reduce the overall storage cost by as much as 70% to 80%.

Cloud-Data-Tieringv2-1-300x225The many advantages of cloud tiering of cold data include:

  • Reduced storage acquisition costs. Flash storage, used for fast access to hot data, is expensive. By tiering off infrequently used data you can purchase a much smaller amount of flash storage, thereby reducing acquisition costs.
  • Cut backup footprint and costs. By continuously tiering off cold data that is not being accessed you can reduce your backup footprint, backup license costs, and backup storage costs if the cold data is placed in robust storage (such as that provided by the major CSPs).
  • Increase disaster recovery speeds and lower disaster recovery (DR) costs. As with backup, by tiering off the cold data, the amount of data mirrored/replicated is dramatically reduced as well.
  • Improved storage performance. By running storage at a lower capacity and by removing access to cold data to another storage device or service, you can increase the performance of your storage array.
  • Leverage the cloud to run AI, ML, compliance checks and other applications on cold data. With cold data in the cloud, you can access, search and process your cold data without putting any load on your storage array. The cold data that is tiered off has value. Being able to process and feed your cold data into your AI/ML/BI engines is critical to staying competitive. By tiering you can extract value from your cold data without burdening your storage array. This also helps to extend the life of your storage array.

Clearly, if cloud tiering is implemented correctly at the file level it will provide all of the above benefits whereas block tiering to the cloud will not. But not all cloud tiering choices are the same.

To learn more about the differences between cloud tiering at the file level vs the block level, and why so-called cloud pools such as NetApp FabricPool or Dell EMC Isilon CloudPools are not the right approach for cloud tiering, read “What you need to know before jumping into the cloud tiering pool”.

Also download the white paper: Cloud Tiering: Storage-Based vs Gateways vs. File-Based.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

Data Archiving

What is Data Archiving?

Data Archiving, often referred to as Data Tiering, protects older data that is not needed for everyday operations of an organization. A data archiving strategy reduces primary storage and allows an organization to maintain data that may be required for regulatory or other needs.

Benefits of a Data Archiving Solution

Data archiving protects older information that is not needed for everyday operations but which users may  occasionally access. Data archiving tools deliver the most value by reducing primary storage costs, rather than acting as a data recovery tool. Unstructured data archive tools are in high demand because they can drastically reduce overall storage costs;  most data is unstructured and resides on expensive, high-performance storage devices. Archive data storage, meanwhile, is typically on a low-performance, lost-cost, high-capacity data storage medium.

Types of Data Archiving

Some data archiving products only allow read-only access to protect data from modification, while other data tiering and archiving products allow users to make changes.

Data archiving take a few different forms:

  • Options include online data storage, which places archive data onto disk systems where it is readily accessible. Archives are frequently file-based, but object storage is also growing in popularity. A key challenge when using object storage to archive file-based data is the impact it can have on users and applications. To avoid changing paradigms from file to object and breaking user and application access, use data management solutions that provide a file interface to data that is archived as objects.
  • Another archival system uses offline data storage where data archiving software writes the data to tape or other removable media. using. Tape consumes less power than disk systems, translating to lower costs.
  • A third option is using cloud data storage, offered by Amazon, Azure and other cloud providers. Cloud object storage is a smart choice for cloud tiering and data archiving because of its low-cost, immutable nature. This is inexpensive but requires ongoing investment.

New requirements for secure data archiving have resulted from more sophisticated cybersecurity and ransomware threats. Encryption of sensitive archives and multi-factor authentication for access and object lock storage (such as AWS S3) are a few ways to protect archival data from modification, corruption and theft.

The data archiving process typically uses automated software, which will automatically move cold data via policies set by an administrator. A popular approach is to make the archive “transparent”  so that users and applications can access archived data from the same location as if it had never moved. (See Native Access)

Learn more about Komprise Transparent Move Technology (TMT).

Getting Started with Komprise:

Want To Learn More?

Data Classification

Data classification is the process of organizing data into tiers of information for data organizational purposes.

Data classification is essential to make data easy to find and retrieve so that your organization can optimize risk management, compliance, and legal requirements. Written guidelines are essential in order to define the categories and criteria to classify your organization’s data. It is also important to define the roles and responsibilities of employees in the data organization structure.

When data classification procedures are established, security standards should also be established to address data lifecycle requirements. Classification should be simple so employees can easily comply with the standard.

Examples of types of data classifications:

  • 1st Classification: Data that is free to share with the public
  • 2nd Classification: Internal data not intended for the public
  • 3rd Classification: Sensitive internal data that would negatively impact the organization if disclosed
  • 4th Classification: Highly sensitive data that could put an organization at risk

Data classification is a complex process, but automated systems can help streamline this process. The enterprise must create the criteria for classification, outline the roles and responsibilities of employees to maintain the protocols, and implement proper security standards. Properly executed, data classification will provide a framework for the data storage, transmission and retrieval of data.

Automation simplifies data classification by enabling you to dynamically set different filters and classification criteria when viewing data across your storage. For instance, if you wanted to classify all data belonging to users who are no longer at the company as “zombie data,” the Komprise Intelligent Data Management solution will aggregate files that fit into the zombie data criterion to help you quickly classify your data.

Data Classification and Komprise Deep Analytics

Komprise Deep Analytics gives data storage administrators and line of business users granular, flexible search capabilities and indexes data creating a Global File Index across file, object and cloud data storage spanning petabytes of unstructured data. Komprise Deep Analytics Actions uses these virtual datasets (see virtual data lake) for systematic, policy-driven data management actions that can feed your data pipelines.

Komprise-blog-storage-teams-using-deep-analytics-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Data Governance

What is data governance?

Data governance refers to the management of the availability, security, usability, and integrity of data used in an enterprise. Data governance in an organization typically includes a governing council, a defined set of procedures, and a plan to execute those procedures.

Data governance is not about allowing access to a few privileged users; instead, it should allow broad groups of users access with appropriate controls. Business and IT users have different needs; business users need secure access to shared data and IT needs to set policies around security and business practices. When done right, data governance allows any user access to data anytime, so the organization can run more efficiently, and users can manage their workload in a self-service manner.

3 things to consider when developing a data governance strategy:

Selecting a Data Governance Team
  • Balance IT and business leaders to get a broad view of the data and service needs
  • Start small – choose a small group to review existing data analytics
Data Quality Strategy
  • Audit existing data to discover data types and how they are used
  • Define a process for new data sources to ensure quality and availability standards are met
Data Security
  • Make sure data is classified so data requiring protection for legal or regulatory reasons meets those requirements
  • Implement policies that allow for different levels of access based on user privileges

Komprise is not a data governance solution but we are part of an overall governance strategy as it relates to unstructured data management. With the Deep Analytics user profile, you can provide secure data access to specific users to search and tag file and object data so that it can then be incorporated into smart data migration and data mobility use cases, including Smart Data Workflows.

Getting Started with Komprise:

Want To Learn More?

Data Lake

A data lake is data stored in its natural state. The term typically refers to unstructured data that is sitting on different storage environments and clouds. The data lake supports data of all types – for example, you may have videos, blogs, log files, seismic files and genomics data in a single data lake. You can think of each of your Network Attached Storage (NAS) devices as a data lake.

One big challenge with data lakes is to comb through them and find the relevant data you need. With unstructured data, you may have billions of files strewn across different data lakes, and finding data that fits specific criteria can be like finding a needle in a haystack

A virtual data lake is a collection of data that fits certain criteria – and as the name implies, it is virtual because the data is not moved. The data continues to reside in its original location, but the virtual data lake gives a discrete handle to manipulate that entire data set. The Komprise Global File Index can be considered to be a virtual data lake for file and object metadata.

Some key aspects of data lakes – both physical and virtual:

  • Data Lakes Support a Variety of Data Formats: Data lakes are not restricted to data of any particular type.
  • Data Lakes Retain All Data: Even if you do a search and find some data that does not fit your criteria, the data is not deleted from the data lake. A virtual data lake provides a discrete handle to the subset of data across different storage silos that fits specific criteria, but nothing is moved or deleted.
  • Virtual Data Lakes Do Not Physically Move Data: Virtual data lakes do not physically move the data, but provide a virtual aggregation of all data that fits certain criteria. Deep Analytics can be used to specify criteria.

Komprise-Global-File-Index-Architecture

Getting Started with Komprise:

Want To Learn More?

Data Lakehouse

Data Lakehouse is a term first coined by the co-founder and then CTO of Pentaho, James Dixon. And while both Amazon and Snowflake had already started using the term “lakehouse,” it wasn’t until Databricks really endorsed it in a January 30, 2020 blog post entitled “What is a Data Lakehouse?” that it received more mainstream attention (amongst data practitioners at least).

You’ve heard of a Data Lake. You’ve heard of a Data Warehouse. Enter the Data Lakehouse.

A data lakehouse is a modern data architecture that combines the benefits of data lakes and data warehouses. A data lake is a centralized repository that stores vast amounts of raw, unstructured, and semi-structured data, making it ideal for big data analytics and machine learning. A data warehouse, on the other hand, is designed to store structured data that has been organized for querying and analysis.

A data lakehouse builds on key elements of these two approaches by providing a centralized platform for storing and processing large volumes of structured and unstructured data, while supporting real-time data analytics. It allows organizations to store all of their data in one place and perform interactive and ad-hoc analysis at scale, making it easier to derive insights from complex data sets. A data lakehouse typically uses modern (and often open source) technologies such as Apache Spark, Apache Arrow, to provide high-performance, scalable data processing.

Who are the data lakehouse vendors?

There are several vendors that offer data lakehouse solutions, including:

  • Amazon Web Services (AWS) with Amazon Lake Formation
  • Microsoft with Azure Synapse Analytics
  • Google with Google BigQuery Omni
  • Snowflake
  • Databricks
  • Cloudera with Cloudera Data Platform
  • Oracle with Oracle Autonomous Data Warehouse Cloud
  • IBM with IBM Cloud Pak for Data

These vendors provide a range of services, from cloud-based data lakehouse solutions to on-premises solutions that can be deployed in an organization’s own data center. The choice of vendor will depend on the specific needs and requirements of the organization, such as: the size of the data sets, the required performance and scalability, the level of security and compliance needed and the overall budget.

Komprise Smart Data Workflows is an automated process for all the steps required to find the right unstructured data across your data storage assets, tag and enrich the data, and send it to external tools such as a data lakehouse for analysis. Komprise makes it easier and more streamlined to find and prepare the right file and object data for analytics, AI, ML projects.

Getting Started with Komprise:

Want To Learn More?

Data Management Policy

What is a Data Management Policy?

A data management policy addresses the operating policy that focuses on the management and governance of data assets, and is a cornerstone of governing enterprise data assets. This policy should be managed by a team within the organization that identifies how the policy is accessed and used, who enforces the data management policy, and how it is communicated to employees.

It is recommended that an effective data management policy team include top executives to lead in order for governance and accountability to be enforced. In many organizations, the Chief Information Officer (CIO) and other senior management can demonstrate their understanding of the importance of data management by either authoring or supporting directives that will be used to govern and enforce data standards.

Considerations to consider in a data management policy

  • Enterprise data is not owned by any individual or business unit, but is owned by the enterprise
  • Enterprise data must be safe
  • Enterprise data must be accessible to individuals within the organization
  • Metadata should be developed and utilized for all structured and unstructured data
  • Data owners should be accountable for enterprise data
  • Users should not have to worry about where data lives
  • Data should be accessible to users no matter where it resides

Ultimately, a data management policy should guide your organization’s philosophy toward managing data as a valued enterprise asset. Watch the video: Intelligent Data Management: Policy-Based Automation

Developing an unstructured data management policy

It is important to develop enterprise-wide data management policies using a flexible governance framework that can adapt to unique business scenarios and requirements. Identify the right technologies following a proof of concept approach that supports specific risk management and compliance use cases. Tool proliferation is always a problem so look to consolidate and set standards that address end-to-end scenarios. Unstructured data management policies must address data storage, data migration, data tiering, data replication, data archiving and data lifecycle management of unstructured data (block, file, and object data stores) in addition to the semi-structured and structured data lakes, data warehouses and other so-called big-data repositories.

2020-Analytics_Driven_Storage-Cover-r2

Read the VentureBeat article: How to create data management policies for unstructured data.
What is a Data Management Policy?

A data management policy addresses the operating policy that focuses on the management and governance of data assets. The data management policy should contain all the guidelines and information necessary for governing enterprise data assets and should address the management of structured, semi-structured and unstructured data.

What does a Data Management Policy contain?

A comprehensive Data Management Policy should contain the following:

  • An inventory of the organization’s data assets
  • A strategy of effective management of the organization’s data assets
  • An appropriate level of security and protection for the data including details of which roles can access with data elements
  • Categorization of the different sensitivity and confidentiality levels of the data
  • The objectives for measuring expectations and success
  • Details of the laws and regulations that must be adhered to regarding the data program
Data Management policy and procedures
Firstly the business much select who should be part of the policy-making process. This should include legal, compliance and risk executives, security and IT leaders, business unit heads and the chief data officer or relevant alternative. Once the committee is selected, they should identify the risks associated with the organizations data and create a data management policy.

Getting Started with Komprise:

Want To Learn More?

Data Migration

Data migration means many different things and there are many types of data migrations in the enterprise world. At it’s core, it is the process of selecting and moving data from one location to another. For this Glossary, we’re focused on Unstructured Data Migration, specifically file and object data. IT organizations use data migration tools to move data across different data storage systems and across different formats and protocols (SMB, NFS, S3, etc.).

Data migrations often occur in the context of retiring a system and moving to a new system, or in the context of a cloud migration, or in the context of a modernization or upgrade strategy.

When it comes to unstructured data migrations and migrating enterprise file data workloads to the cloud, data migrations can be laborious, error prone, manual, and time consuming. Migrating data may involve finding and moving billions of files (large and small), which can succumb to storage and network slowdowns or outages. Also, different file systems do not often preserve metadata in exactly the same way, so migrating data to a cloud environment without loss of fidelity and integrity can be a challenge.

Two Data Migration Approaches

Lift-and-Shift

Many organizations start here, thinking they’ll just migrate entire file shares and directories to the cloud. If this is your data migration plan, it’s important to use analytics to plan and migrate to reduce errors, ensure alignment and multi-storage visibility while minimizing cutover. With Komprise Elastic Data Migration, you can readily migrate from one primary vendor to another without rehydrating all the archived data, so migrations are cheaper and faster.

Cloud Data Tiering as a First Step: Smart Data Migration

Since a large percentage of file data is cold and has not been used in a year or more, tiering and archiving cold data is a smart first step – especially if you use Transparent Move Technology so users can access the files exactly as before. You can follow this up by migrating the remaining hot data to a performance cloud tier.

Data Migration Questions

Here are some questions that will help you determine the best file and object data migration strategy:

  • What data storage do we have and where?​ (primary storage, secondary storage)
  • What data sets are accessed most frequently (hot) and less frequently (cold)?​
  • What types of data and files do we have and which are taking up the most storage (image files, video, audio files, sensor data, etc.)?​
  • What is the cost of storing these different file types today? How does this align with the budget and projected growth?​
  • Which types of files should be stored at a higher security level? (PII or IP data? Mission-critical projects?)​
  • Are we complying with regulations and internal policies with our unstructured data management practices?
  • What constraints do my network and environment pose and how do I avoid surprises during migrations?
  • Do we have the best possible strategy in place for WAN acceleration, such as Komprise Hypertransfer for Elastic Data Migration.

komprise-elastic-data-migration-page-promo-1536x349

Getting Started with Komprise:

Want To Learn More?

Data Migration Software

There are many data migration software tools available to facilitate the process of moving data from one system or platform to another. The choice of a specific data migration software tool depends on factors such as the type of data, the scale of data migration, source and target systems, and other specific requirements.

For semi-structured and structured data sources, extraction, transformation and load (ETL or ELT) tools are often used for data migrations. For unstructured data migrations, where the data lacks a predefined data model or structure, the challenges are different from migrating structured data. Unstructured data can include text documents, images, videos, audio files, and other content that doesn’t fit neatly into a relational database.

Cloud Migration Software Options

EasyPathtoCloud_2-150x150Cloud migrations of file data can be complex, labor-intensive, costly, and time-consuming. Enterprises typically consider the following options: 

  • Free Tools: These tools require a lot of custom development are less reliable and resilient and generally aren’t built to migrate massive volumes of data. It’s important to look at broader unstructured data management requirements, not just one-off data migration requirements.
  • Point Data Migration Solutions: These data migration tools typically have complex legacy architectures that were not built for the modern scale of data, which can create ongoing data migration and data management challenges.
  • Komprise Elastic Data Migration: Designed to make cloud data migrations simple, fast, reliable and eliminates sunk costs since you continue to use Komprise after the migration, Komprise gives you the option to cut 70%+ cloud storage costs by placing cold data in Object classes while maintaining file metadata so it can be promoted in the cloud as files when needed. Learn more about Smart Data Migration.

Getting Started with Komprise:

Want To Learn More?

Data Migration Warm Cutover

A warm cutover in the context of data migration refers to a data migration strategy in which the process involves transitioning from the old data system to the new one with a limited downtime or service interruption. It is a phased approach that allows for the coexistence of both the old and new data systems during a specific period. Warm cutover strategies are often employed when it’s essential to maintain data availability and minimize disruption to ongoing operations.

komprise-elastic-data-migration-page-promo

Key steps and considerations in a warm cutover for data migration

Preparation Phase

  • Planning: Define the scope, objectives, and timeline for the data migration. Identify the specific data sets, systems, or databases that need to be migrated.
  • Data Assessment: Assess the quality, completeness, and structure of the data in the source system. Clean and prepare the data as needed.
  • Infrastructure Readiness: Ensure that the infrastructure for the new data system is set up and configured, including the hardware, software, and network components.

Parallel Operation:

  • Data Replication: Set up mechanisms for data replication or synchronization between the old and new data systems. This ensures that data changes made in one system are mirrored in the other in near real-time.
  • Testing: Perform thorough testing of the new data system while it operates in parallel with the old system. Verify data integrity, performance, and functionality.
  • User Training: Train end-users, administrators, and support teams on how to use the new data system effectively.

Data Transition:

  • Gradual Migration: Begin migrating data from the old system to the new one in stages. This can be done by migrating specific data sets, databases, or tables incrementally.
  • Validation: Validate the migrated data to ensure that it matches the source data in terms of accuracy and completeness. Data reconciliation and verification are crucial at this stage.

Monitoring and Verification:

  • Monitoring: Continuously monitor the health and performance of both the old and new data systems during the transition period.
  • User Acceptance Testing (UAT): Involve end-users in user acceptance testing to ensure that the new data system meets their requirements and expectations.

Final Transition:

  • Data Synchronization: Once the new data system is confirmed to be stable and accurate, perform a final data synchronization to ensure that both systems have the same data.
  • Switch Over: Redirect users and applications to the new data system while minimizing downtime. Ensure that all data transactions are processed in the new system.

Post-Cutover Activities:

  • Validation: Conduct post-cutover validation to confirm that data remains consistent and accessible in the new system.
  • Monitoring and Support: Continue monitoring the new data system and provide support as needed to address any post-migration issues.
  • Documentation: Update documentation and procedures to reflect the new data system and its operational requirements.

With the release of Komprise Intelligent Data Management 5.0, Komprise Elastic Data Migration supports warm cutover. Warm cutover strategies are particularly suitable for data migration scenarios where organizations cannot afford extended downtime or where data continuity is critical, such as in healthcare, financial services, and online commerce. Careful planning, rigorous testing, and meticulous data validation are essential to ensure a smooth transition from the old data system to the new one while maintaining data integrity and availability.

Komprise-Tips-for-Cloud-Data-Migration-Blog-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Data Retrieval

Data retrieval refers to the process of accessing and retrieving data from a database or data storage system. Data retrieval is possible using various techniques and tools, such as database querying, data mining, and data warehousing. The specific techniques and tools used will depend on the type of data being retrieved, along with the requirements and goals of the organization.

Some benefits of effective data retrieval include:

  • Improved data access: By providing quick and easy access to data, organizations can improve their overall data management processes and make better use of their existing data.
  • Better decision making: By providing access to up-to-date and accurate information, data retrieval can help organizations to make better decisions and improve their overall performance.
  • Better customer insights: By retrieving and analyzing customer data, organizations can gain valuable insights into customer behavior and preferences, so they can improve customer relationships and drive business growth.

Cloud Data Retrieval

There are several challenges associated with retrieving data from the cloud, including:

  • Network Latency: Retrieving data from a remote server can result in significant latency, especially if the data is large or the network is congested.
  • Bandwidth Limitations: Bandwidth limitations can limit the speed at which data can be retrieved from the cloud.
  • Data Security: Ensuring the security and privacy of data stored in the cloud can be challenging, especially for sensitive data.
  • Data Compliance: Organizations must ensure that their data retrieval practices comply with relevant regulations and standards, such as data privacy laws and industry standards.
  • Data Availability: In some cases, cloud data may not be available due to network outages, server downtime, or other technical issues.
  • Cloud Costs: Retrieving large amounts of data from the cloud can be expensive, especially if the data is stored in a high-performance tier.
  • Complexity: Interacting with cloud data storage systems can be complex and requires a certain level of technical expertise.

Cloud Data Retrieval and Egress Costs

Egress fees refer to the costs associated with transferring data from a cloud storage service to an external location or to another cloud provider. Many cloud service providers charge fees for data egress, as transferring large amounts of data can put a strain on their network and infrastructure. The cost of egress is usually based on the amount of data transferred, the distance of the transfer, and the speed of the transfer.

It is important for organizations to understand their cloud service provider’s data egress policies and fees, as well as their data transfer needs, to avoid unexpected costs. Organizations can minimize egress costs by compressing data, reducing the amount of data transferred, or storing data in the same geographic region as their computing resources.

The Benefits of Smart File Data Migration

A smart data migration strategy for enterprise file data means an analytics-first approach ensuring you know which data can migrate, to which class and tier, and which data should stay on-premises in your hybrid cloud storage infrastructure. With Komprise, you always have native data access, which not only removes end-user disruption, but also reduces egress costs and the need for rehydration and accelerates innovation in the cloud.

Komprise-Smart-Data-Migration-Webinar-SOCIAL-ONDEMAND

Getting Started with Komprise:

Want To Learn More?

Data Sprawl

What is Data Sprawl?

Data sprawl describes the staggering amount of unstructured data produced by enterprises worldwide every day; with new devices, including enterprise and mobile applications added to a network, it is estimated data sprawl to be 40% year over year, into the next decade.

Given this growth in data sprawl, data security is imperative, as it can lead to enormous problems for organizations, as well as its employees and customers. In today’s fast-paced world, organizations must carefully consider how to best manage the precious information it holds.

Organizations experiencing unstructured data sprawl need to secure all of their endpoints. Security is critical. Addressing data security as well as remote physical devices ensure organizations are in compliance with internal and external regulations.

As the amount of security threats mount, it is critical that data sprawl is addressed. Taking the right steps to ensure data sprawl is controlled, via policies and procedures within an organization, means safeguarding not only internal data, but also critical customer data.

Organizations should develop solid practices that may have been dismissed in the past. Left unchecked, control of an organization’s unstructured data will continue to manifest itself in hidden costs and limited options. With a little evaluation and planning, it is an aspect of your network that can be improved significantly and will pay off long term.

Analyzing and Managing Unstructured Data: Getting Sprawl (and Costs) Under Control

According to this Geekwire article, Gartner estimates that unstructured data represents an astounding 80 to 90% of all new enterprise data, and it’s growing 3X faster than structured data. Komprise Intelligent Data Management rapidly analyzes file ad object unstructured data in-place across multi-vendor storage to provide aggregate analytics (e.g., how much data, how much is hot, how much is cold, what types, top users, etc.) as well as a Global File Index across cloud and on-prem environments. The Komprise Global File Index is highly efficient and scalable to handle billions of files, exabytes of data without the scalability issues of using a central database or any other centralized architectures. Customers can build queries using Komprise Deep Analytics to find the precise subset of data they need through any combination of metadata and tags, and then move, copy and tier that data using Deep Analytics Actions. Komprise combines in-place analytics with data movement and on-going data management to provide a closed-loop system that is intelligent and adapts to a customer’s unique needs. The functionality is also available via API.

Tackling Data Sprawl with Komprise Analysis

Komprise-Analysis-blog-SOCIAL-1-1-768x402

Komprise Analysis provides consistent unified insights into unstructured data across many vendors’ storage and cloud platforms. Key metrics include data volume, data growth rates, where data is stored, top owners, top file types/sizes and time of last access. Komprise can create cost models based on different storage targets and tiering plan that will show.

Getting Started with Komprise:

Want To Learn More?

Data Storage

What is Data Storage?

Data storage refers both to the methods of transferring digital information from the source (users, applications, sensors) via protocols or APIs and to the destination; it consists of physical storage media such as magnetic or solid-state disks, tape, or optical, or cloud-based file and object storage. Data storage is pervasive and  implemented in enterprise data centers, cloud providers, and consumer technology such as laptops, and phones. 

From genomics and medical imaging to streaming video, electric cars, IoT at the edge and user generated data, unstructured data growth is exploding. Enterprise IT organizations are looking to new cloud and hybrid cloud strategies to manage costs and investing in unstructured data management and cloud data migration and cloud data management technologies and strategies to reduce data storage costs and while maximizing data value.

unstructuredData-768x415

What are the different types of data storage protocols?

File Data Storage: File storage records data to files that are organized in folders, and the folders are organized under a hierarchy of directories and subdirectories. For example, a text file stored to your home directory on your laptop. File data is typically used for collaboration and shared access.

Examples of File Storage

NAS Network Attached Storage, Network File System NFS, and Server Message Block SMB

File Storage Vendor Solutions

NetApp ONTAP, Dell/EMC PowerScale (Isilon), Qumulo, Microsoft Windows Server, Pure FlashBlade, Amazon FSx, Azure Files

Block Data Storage

Typically used in servers and workstations where data is being written directly to physical media (HDD or SSD) in chunks or blocks. In contrast to file, block data is typically dedicated for access by a single application. Block storage is often used for the most performance intensive applications.

  • Examples of Block Data Storage: Direct Attached Storage DAS, Storage Attached Network SAN, iSCSI, NVME
  • Block Storage Vendor Solutions: Pure FlashArray, Dell/EMC VMAX, NetApp ONTAP and E-series, HDS
Object Storage

Also known as object-based storage or cloud storage, is a way of addressing and manipulating data storage as objects. In contrast to file storage, object data is stored in a flat namespace. Object storage was designed for use in massive repositories and is accessed over the HTTP protocol as a REST API.

  • Examples of Object Storage: AWS S3, Azure Blob, Google Cloud Storage, Cloud Data Management Interface (CDMI)
  • Object Storage Vendors: AWS, Azure, Google, Wasabi, Cloudian, NetApp, Dell/EMC, Scality
NDMP (Network Data Management Protocol)

Storage protocol that allows file servers and backup applications to communicate directly to a network-attached tape device for backup or recovery operations.

What are types of physical storage media?

  • Hard Disk Drive (HDD): Disk based storage, used for high density data storage. Data is written to a magnetic layer of spinning disk.
  • Solid State Drive (SSD): Also known as flash. Silicone replaces the spinning disk component of HDD to achieve higher performance and smaller form factor.
  • Tape: Data is written to a ribbon of magnetic material in a cartridge. Used strictly for backup and archive, tape’s slow performance is off set by low cost, high levels of density, and the ability to be stored offline. 
  • Optical Storage: In contrast to magnetic storage data is recorded optically to media such as CD and DVD disks. Optical storage is used for durable, long term, off-line, archival storage. 

What is Primary Storage?

Primary storage is used for active read and write data sets where high performance is critical. SSD or flash media with the highest level of performance is the ideal storage media for primary storage. While less typical HDD is also used as primary storage where lower cost and storage density is the key factor.

What is Secondary Storage?

Also referred to as active archive, secondary storage is used for less frequently accessed data sets. While any protocol and media can be used for secondary storage HDD with NAS and Object are the most common choices. Use cases for secondary storage is data tiering and backup / data protection applications.

Read the white paper: Block-Level vs. File Level Tiering

What is Data Storage?

Data storage refers both to the methods of transferring digital information from the source (users, applications, sensors) via protocols or APIs and to the destination; physical storage media such as magnetic or solid-state disks, tape, or optical.

What is Block Level Data Storage?

Mainly used in servers and workstations where data is being written directly to physical media (HDD or SSD) in chunks or blocks. As opposed to file level data storage, block level data storage is mostly dedicated for access by a single application. Block storage uses either direct attached storage (DAS), or data transfer protocols Fiber Channel (FC) or iSCSI (Internet Small Computer Systems Interface) via a storage area network (SAN).

What is Data Lake Storage in Azure?

Data Lake Storage in Azure from Microsoft is a fully managed scalable system based on a secure cloud platform that provides industry-standard, cost-effective storage for big data analytics.

Getting Started with Komprise:

Want To Learn More?

Data Storage Costs

Data storage costs are the expenses associated with storing and maintaining data in various forms of storage media, such as hard drives, solid-state drives (SSDs), cloud storage, and tape storage. These costs can be influenced by a variety of factors, including the size of the data, the type of storage media used, the frequency of data access, and the level of redundancy required. As the amount of unstructured data generated continues to grow, the cost of storing it remains a significant consideration for many organizations. In fact, according to the Komprise 2023 State of Unstructured Data Management Report, the majority of enterprise IT organizations are spending over 30% of their budget on data storage, backups and disaster recovery. This is why shifting from storage management to storage-agnostic data management continues to be a topic of conversation for enterprise IT leaders.

Komprise-2023-State-of-Unstructured-Data-Management-PR_-Linkedin-Social-1200px-x-628px

Cloud Data Storage Costs

Cloud data storage costs refer to the expenses incurred for storing data on cloud storage platforms provided by companies like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). In addition to the points above about data storage costs (amount of data stored and frequency of data access) in the cloud the level of durability and availability required are also factors when it comes to cloud storage costs. Cloud data storage providers typically charge based on the amount of data stored per unit of time, and additional fees may be incurred for data retrieval, data transfer, and data processing. Many cloud storage providers offer different storage tiers with varying levels of performance and cost, allowing customers to choose the option that best fits their budget and performance needs. With the right cloud data management strategy, cloud storage can be more cost-effective than traditional hardware-centric on-premises storage, especially for organizations with large amounts of data and high storage needs.

Managing Data Storage Costs

Managing data storage costs involves making informed decisions (and the right investment strategies) about how to store, access, and use data in a cost-effective manner. Here are some strategies for managing data storage costs:

  • Data archiving: Archiving infrequently accessed data to lower cost storage options, such as object storage or tape, can help reduce storage costs.
  • Data tiering: Using different storage tiers for different types of data based on their access frequency and importance can help optimize costs.
  • Compression and deduplication: A well known data storage technique, compressing data and deduplicating redundant data can help reduce the amount of storage needed and lower costs.
  • Cloud file storage: Using cloud storage can be more cost-effective than traditional on-premises storage, especially for organizations with large amounts of data and high storage needs.
  • Data lifecycle management (aka Information Lifecycle Management): Regularly reviewing and purging unneeded data can help control storage costs over time.
  • Cost monitoring and optimization (see cloud cost optimization): Regularly monitoring and analyzing data storage costs and usage patterns can help identify opportunities for cost optimization.

By using a combination of these strategies, organizations can effectively manage their data storage costs and ensure that they are using their data storage resources efficiently. Additionally, organizations can negotiate with data storage providers to secure better pricing and take advantage of cost-saving opportunities like bulk purchasing or long-term contracts.

Stop Overspending on Data Storage with Komprise

The blog post How Storage Teams Use Komprise Deep Analytics summarizes a number of strategies storage teams use Komprise Intelligent Data Management to deliver greater data storage cost savings and unstructured data value to the business, including:

  • Business unit metrics with interactive dashboards
  • Business-unit data tiering, retention and deletion
  • Identifying and deleting duplicates
  • Mobilizing specific data sets for third-party tools
  • Using data tags from on-premises sources in the cloud

In the blog post Quantifying the Business Value of Komprise Intelligent Data Management, we review a storage cost savings analysis that saves customers an average 57% of overall data storage costs and over $2.6M+ annually. In addition to cost savings, benefits include:

Plan Future Data Storage Purchases with Visibility and Insight

With an analytics-first approach, Komprise delivers visibility into how data is growing and being used across a customer’s data storage silos – on-premises and in the cloud. Data storage administrators no longer have to make critical storage capacity planning decisions in the dark and now can understand how much more storage will be needed, when and how to streamline purchases during planning.

Optimize Data Storage, Backup, and DR Footprint

Komprise reduces the amount of data stored on Tier 1 NAS, as well as the amount of actively managed data—so customers can shrink backups, reduce backup licensing costs, and reduce DR costs.

Faster Cloud Data Migrations

Auto parallelize at every level to maximize performance, minimize network usage to migrate efficiently over WANs, and migrate more than 25 times faster than generic tools across heterogeneous cloud and storage with Elastic Data Migration.

Komprise-Hypertransfer-Migration-PR-SOCIAL

Reduced Datacenter Footprint

Komprise moves and copies data to secondary storage to help reduce on-premises data center costs, based on customizable data management policies.

Risk Mitigation

Since Komprise works across storage vendors and technologies to provide native access without lock-in, organizations reduce the risk of reliance on any one storage vendor.

Rein-In-Storage-768x512

Getting Started with Komprise:

Want To Learn More?

Data Tagging

What is data tagging?

Data tagging is the process of adding metadata to your file data in the form of key value pairs. These values give context to your data, so that others can easily find it in search and execute actions on it, such as move to confinement or a cloud-based data lake. Data tagging is valuable for research queries and analytics projects or to comply with regulations and policies.

How does Komprise data tagging work?

Komprise-Automated-Data-Tagging-blog-THUMBUsers, such as data owners, can apply tags to groups of files and tags can also be applied programmatically by analytics applications via API. In the Komprise Deep Analytics interface, users can query the Global File Index and find the data for tagging. This is done by creating a Komprise Plan that will invoke the text search function to inspect and tag the selected files. The ability to use Komprise Intelligent Data Management to search, find, apply tags and then take action makes it possible for customers to get faster value from enriched data sets.

Tagging and Smart Data Workflows

Komprise-Smart-Data-Workflows-blog-SOCIAL-1-768x402

Komprise Smart Data Workflows automate unstructured data discovery, data mobility and the delivery of data services.

  • Define custom query to find specific data set.
  • Analyze and tag data sets with additional metadata
  • Move only the tagged data for analytics, AI/ML, etc.
  • Move to a lower-cost data storage tier after analysis

Komprise-Search-and-Tag-Blog-THUMB

———-

Getting Started with Komprise:

Want To Learn More?

Data Tiering

Data Tiering refers to a technique of moving less frequently used data, also known as cold data, to cheaper levels of storage or tiers. The term “data tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems. See also cloud tiering and choices for cloud data tiering.

komprise-file-tiering-image-768x404

Data Tiering Cuts Costs Because 70%+ of Data is Cold

As data grows, storage costs are escalating. It is easy to think the solution is more efficient storage. But the real cause of storage costs is poor data management. Over 70% of data is cold and has not been accessed in months, yet it sits on expensive storage and consumes the same backup resources as hot data. As a result, data storage costs are rising, backups are slow, recovery is unreliable, and the sheer bulk of this data makes it difficult to leverage new options like Flash and Cloud.

Data Tiering Was Initially Used within a Storage Array

Data Tiering was initially a technique used by storage systems to reduce the cost of data storage by tiering cold data within the storage array to cheaper but less performant options – for example, moving data that has not been touched in a year or more from an expensive Flash tier to a low-cost SATA disk tier.

Typical storage tiers within a storage array include:
  • Flash or SSD: A high-performance storage class but also very expensive. Flash is usually used on smaller data sets that are being actively used and require the highest performance.
  • SAS Disks: Usually the workhorse of a storage system, they are moderately good at performance but more expensive than SATA disks.
  • SATA Disks: Usually the lowest price-point for disks but not as performant as SAS disks.
  • Secondary Storage, often Object Storage: Usually a good choice for capacity storage – to store large volumes of cool data that is not as frequently accessed, at a much lower cost.

Cloud-Data-Tieringv2-1-300x225

Cloud Data Tiering is now Popular

Increasingly, customers are looking at another option – tiering or archiving data to a public cloud.

  • Public Cloud Storage: Public clouds currently have a mix of object and file storage options. The object storage classes such as Amazon S3 and Azure Blob (Azure Storage) provide tremendous cost efficiency and all the benefits of object storage without the headaches of setup and management.

Tiering and archiving less frequently used data or cold data to public cloud storage classes is now more popular. This is because customers can leverage the lower cost storage classes within the cloud to keep the cold data and promote them to the higher cost storage classes when needed. For example, data can be archived or tiered from on-premises NAS to Amazon S3 Infrequent Access or Amazon Glacier for low ongoing costs, and then promoted to Amazon EFS or FSX when you want to operate on it and need performance.

But in order to get this level of flexibility, and to ensure you’re not treating the cloud as just a cheap storage locker, data that is tiered to the cloud needs to be accessible natively in the cloud without requiring third-party software. This requires file-tiering, not block-tiering.

Block Tiering Creates Unnecessary Costs and Lock-In

Block-level tiering was first introduced as a technique within a storage array to make the storage box more efficient by leveraging a mix of technologies such as more expensive SAS disks as well as cheaper SATA disks.

Block tiering breaks a file into various blocks – metadata blocks that contain information about the file, and data blocks that are chunks of the original file. Block-tiering or Block-level tiering moves less used cold blocks to lower, less expensive tiers, while hot blocks and metadata are typically retained in the higher, faster, and more expensive storage tiers.

Block tiering is a technique used within the storage operating system or filesystem and is proprietary. Storage vendors offer block tiering as a way to reduce the cost of their storage environment. Many storage vendors are now expanding block tiering to move data to the public cloud or on-premises object storage.

But, since block tiering (often called CloudPools – examples are NetApp FabricPool and Dell EMC Isilon CloudPools) is done inside the storage operating system as a proprietary solution, it has several limitations when it comes to efficiency of reuse and efficiency of storage savings. Firstly, with block tiering, the proprietary storage filesystem must be involved in all data access since it retains the metadata and has the “map” to putting the file together from the various blocks. This also means that the cold blocks that are moved to a lower tier or the cloud cannot be directly accessed from the new location without involving the proprietary filesystem because the cloud does not have the metadata map and the other data blocks and the file context and attributes to put the file together. So, block tiering is a proprietary approach that often results in unnecessary rehydration of the data and treats the cloud as a cheap storage locker rather than as a powerful way to use data when needed.

The only way to access data in the cloud is to run the proprietary storage filesystem in the cloud which adds to costs. Also, many third-party applications such as backup software that operate at a file level require the cold blocks to be brought back or rehydrated, which defeats the purpose of tiering to a lower cost storage and erodes the potential savings. For more details, read the white paper: Block vs. File-Level Tiering and Archiving.

Know Your Cloud Tiering Choices

CloudTieringMigrations-WebinarOnDemandthumb

File Tiering Maximizes Savings and Eliminates Lock-In

File-tiering is an advanced modern technology that uses standard protocols to move the entire file along with its metadata in a non-proprietary fashion to the secondary tier or cloud. File tiering is harder to build but better for customers because it eliminates vendor lock-in and maximizes savings. Whether files have POSIX-based Access Control Lists (ACLs) or NTFS extended attributes, all this metadata along with the file itself is fully tiered or archived to the secondary tier and stored in a non-proprietary format. This ensures that the entire data can be brought back as a file when needed. File tiering does not just move the file, but it also moves the attributes and security permissions and ACLS along with the file and maintains full file fidelity even when you are moving a file to a different storage architecture such as object storage or cloud. This ensures that applications and users can use the moved file from the original location, and they can directly open the file natively in the secondary location or cloud without requiring any third-party software or storage operating system.

Since file tiering maintains full file fidelity and native access based on standards at every tier, it also means that third party applications can access the moved data without requiring any agents or proprietary software. This ensures that savings are maximized since backup software and other third -arty applications can access moved data without rehydrating or bringing the file back to the original location. It also ensures that the cloud can be used to run valuable applications such as compliance search or big data analytics on the trove of tiered and archived data without requiring any third-party software or additional costs.

File-tiering is an advanced technique for archiving and cloud tiering that maximizes savings and breaks vendor lock-in.

Data Tiering Can Cut 70%+ Storage and Backup Costs When Done Right

In summary, data tiering is an efficient solution to cut storage and backup costs because it tiers or archives cold, unused files to a lower-cost storage class, either on-premises or in the cloud. However, to maximize the savings, data tiering needs to be done at the file level, not block level. Block-level tiering creates lock-in and erodes much of the cost savings because it requires unnecessary rehydration of the data. File tiering maximizes savings and preserves flexibility by enabling data to be used directly in the cloud without lock-in.

Why Komprise is the easy, fast, no lock-in path to the cloud for file and object data.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

Deep Analytics

What is Deep Analytics?

Deep analytics is the process of applying data mining and data processing techniques to analyze and find large amounts of data in a form that is useful and beneficial for new applications. Deep analytics can apply to both structured and unstructured data.

In the context of unstructured data and unstructured data management, Komprise Deep Analytics is the process of examining file and object metadata (both standard and extended) across billions of files to find data that fits specific criteria. A petabyte of unstructured data can be a few billion files. Analyzing petabytes of data typically involves analyzing tens to hundreds of billions of files. Because analysis of such large workloads can require distribution over a farm of processing units, deep analytics is often associated with scale-out distributed computing, cloud computing, distributed search, and metadata analytics.

Deep analytics of unstructured file and object data requires efficient indexing and search of files and objects across a distributed farm. Financial services, genomics, research and exploration, biomedical, and pharmaceutical are some of the early adopters of Komprise Deep Analytics, which is powered by a Global File Index medata catalog. In recent years, enterprises have started to show interest in deep analytics as the amount of corporate unstructured data has increased, and with it, the desire to extract value from the data.

TechKrunch-Nov10

Deep analytics enables additional use cases such as Big Data Analytics, Artificial Intelligence and Machine Learning.

When the result of a deep analytics query is a virtual data lake, which we call the Global File Index, data does not have to be moved or disrupted from its original destination to enable reuse. This is an ideal scenario to rapidly leverage deep analytics without disruption since data can be pretty heavy to move.

Learn more about Komprise Deep Analytics.

Learn more about Deep Analytics with Actions.

Komprise-Deep-Analytics-Actions-Oct-2021-Blog-Social-768x402

Read the blog post: How Storage Teams Use Deep Analytics

Getting Started with Komprise:

Want To Learn More?

Dell PowerScale

Dell PowerScale is the name of Dell Technologies scale-out network-attached storage (NAS) solution. According to Dell, PowerScale is designed to provide high-performance storage for unstructured data workloads and is well-suited for demanding file and object storage requirements. In 2020, Dell rebranded many of the acquired EMC technologies such as EMC Isilon to PowerScale.

PowerScale is used in a variety of industries, including media and entertainment, healthcare, research, and financial services, where large-scale data storage, high performance, and data-intensive workloads are critical.

Whether your use case is cloud tiering, cloud data migration or optimizing performance and reducing storage costs, with Komprise for Dell PowerScale technologies you are able to:

Learn more about Komprise for Dell EMC.

Learn more about Smart Migration from PowerScale Isilon.

Getting Started with Komprise:

Want To Learn More?

Department Showback

Department showback is a financial management practice that involves tracking and reporting on the costs associated with specific departments or business units within an organization. Also see Showback. It is a way to allocate and show the IT or operational costs incurred by various departments or units to help them understand their resource consumption and budget utilization. Department showback is often used as a transparency and accountability tool to foster cost-awareness and responsible resource usage.

Key aspects of department showback

Cost Attribution: Department showback allocates or attributes the costs of IT services, infrastructure, or other shared resources to individual departments or business units based on their actual usage or consumption. This helps departments understand their financial responsibilities.

Reporting and Visualization: The results of department showback are typically presented in reports or dashboards that clearly outline the costs incurred by each department. Visualization tools can make it easier for department heads and executives to understand the cost breakdown.

Transparency: By providing departments with detailed information on their costs, department showback promotes transparency and accountability in resource consumption. It allows departments to see the financial impact of their decisions.

Budgeting and Planning: Armed with cost data, departments can better plan and budget for their future resource needs. They can make more informed decisions about IT or operational expenditures.

Chargeback vs. Showback

Department showback is different from chargeback. In chargeback, departments are billed for the actual costs they incur. In showback, departments are informed of their costs, but no actual billing takes place. Showback is often used for educational and cost-awareness purposes, while chargeback is a financial transaction. Both models are popular with data storage and becoming more popular with broader adoption of storage-agnostic unstructured data management software.

  • Cost Optimization: Armed with cost information, departments can identify opportunities for cost optimization. This might involve reducing unnecessary resource usage or finding more cost-effective alternatives.
  • Resource Allocation: Departments can use the cost data to justify resource allocation requests, ensuring that they have the resources needed to meet their objectives.
  • Data-Driven Decision-Making: Department showback promotes data-driven decision-making by providing departments with financial data that can guide their choices and strategies.
  • Benchmarking: Comparing the costs of similar departments or units can help identify best practices and opportunities for improvement.

Department showback is particularly valuable in organizations with complex IT infrastructures, cloud services, or shared resources. It helps ensure that resources are used efficiently, aligns costs with departmental priorities, and fosters a culture of financial responsibility and accountability.

It’s important to note that department showback should be implemented with clear communication and collaboration between the finance department, IT, and department heads to ensure that cost allocation methods are fair and accurate. Additionally, the success of department showback depends on the organization’s commitment to using cost data to inform decision-making and drive cost optimization efforts.

Fall-2022-Product-Launch-blog-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Digital Pathology Data Management

According to the Digital Pathology Association:

 “Digital pathology is a dynamic, image-based environment that enables the acquisition, management and interpretation of pathology information generated from a digitized glass slide.”

Healthcare organizations have shifted to digital media for medical imaging. Digital pathology, digital PACS and VNA systems are all generating and now storing petabytes of medical imaging data—lab slides, X-rays, MRIs, CT scans and more. These ever-expanding datasets are pushing the limitations of data storage systems and challenging IT department’s ability to effectively manage data. And with increasing regulations, healthcare providers typically must retain medical imaging files for many years. In addition to compliance requirements, clinical researchers may also need access to the data indefinitely. They also typically need access to the unstructured data immediately. The potential future value of this ever-expanding data repository must be weighed against the growing financial and overall unstructured data management costs.

The Digital Pathology Data Management Challenge

Medical-Imaging-White-Paper-SOCIAL-3-768x402Data center storage for large image files is expensive – typically costing millions a year for some organizations on expensive NAS devices. Not only is NAS expensive, but its data must also be secured, replicated and backed up, which typically triples the costs. Meanwhile, in most cases, imaging data is rarely accessed after a few days or weeks. To get greater flexibility and manage data storage costs, healthcare organizations are adopting unstructured data management software to tier cold medical imaging data out of expensive storage to cost-effective environments such as the cloud. Data management decisions can be difficult internally with politics, vendor relationships and long-standing institutional perspectives. Health systems are handling sensitive patient information and tolerance for downtime is usually quite low.

There are many benefits from augmenting medical imaging solutions with data management software that transparently tiers cold data from your data storage and backups.

Komprise has many customers in the healthcare industry dealing with multiple petabytes of file and object data.

Learn more.

Getting Started with Komprise:

Want To Learn More?

Dynamic Links

Komprise takes the native advantages of symbolic links and innovated further, dynamically binding them to the file at runtime, akin to a DNS router. This makes the links themselves disposable – if a link is accidentally deleted, Komprise can restore it. When Komprise tiers data from a file system, it replaces the original file with a Dynamic Link address: resilient, always available and flexible. There are several benefits to the Dynamic Link approach:

  1. The file can be moved again through its data lifecycle and the link is unchanged.
  2. Allows Komprise to not sit in the hot data and metadata paths because it uses standard file system constructs.
  3. The link is resilient when coupled with the high-availability architecture of Komprise and has no single point of failure.

Here is a summary of these advantages in an unstructured data migration use case.

Once Komprise moves a file and replaces it with a Dynamic Link, if the file is moved again – say, for example, after the first archive of a file to an object store and another later to the cloud – the Dynamic Link address does not need to be changed. This eliminates the challenges of managing links. Users and applications continue to access the moved data transparently from the original location even as the data is moved throughout its lifecycle, without any changes. Learn More about Komprise TMT.

stubs_before_after_migration
Before and After Migration: SMB (Windows) Systems
stubs_before_after_migration_2
Before and After Migration: NFS (Linux) Systems

By leveraging a standard protocol construct whenever possible (in more than 95% of all cases), Komprise is able to deliver non-proprietary, open, transparent data access without getting in front of the hot data or metadata. If a user accidentally deletes the links on the source, Komprise can repopulate the links since the link itself does not contain the context of the moved file. Data can be moved from one destination to another (e.g. for ongoing unstructured data management) and there are no changes to the link.

To the user, this means no disruption. Users’ storage teams won’t get bothered with help desk tickets from employees unable to find their data, and their applications will be able to keep access to their data. Users and applications that rely on the data that has been moved by Komprise are unaffected.

Stubs versus Dynamic Symbolic Links at a Glance

stubs_at_a_glance
Comparing Stubs and Komprise Dynamic Links

While it’s clear that symlinks offer superior resilience and flexibility than static stubs, it’s not that stubs are never useful or never used by Komprise. While most file servers support symbolic links, there are a few situations where the file servers do not support symbolic links. For such file servers, Komprise uses stubs that are dynamic. Dynamic stubs point to Komprise, which redirects them to actual files in the target. This ensures that even if the stub is lost, the corresponding file on the target can be accessed via Komprise and the stub can be restored. Komprise’s dynamic stubs can be made similar in size and appearance to the original file.

Watch a TMT Chalk Talk presentation with Komprise CTO Mike Peercy.

Getting Started with Komprise:

Want To Learn More?

Egress Costs

Egress costs are the network fees most cloud providers charge to move your data out of the cloud. Most allow you to move your data into the cloud for free (ingress). It’s important to understand ingress and egress fees when moving data to the cloud. If you have moved data to cold storage in the cloud for archiving purposes but users recall it more than expected, you may incur hefty egress costs. Egress fees also happen when data is pulled out of cloud storage for use in analytics applications and to transfer data to another cloud region or cloud service.

In the post 5 Tips to Optimize Your Unstructured Data, a key benefit of embracing open, standards-based unstructured data management is that organizations can do whatever they need to do with their file and object data data without paying licensing penalties and costs, such as for a third-party cloud file system or unnecessary cloud-egress fees. Komprise moves and manages unstructured data in native format in each tier, which means you can directly access the data and use all the cloud data services on your data without having to pay a data management or storage vendor. Avoiding these costs, including egress costs, is a priority for IT leaders surveyed by Komprise. Read the report: State of Unstructured Data Management.

To learn more about Egress Costs read the New Stack article: Why Data Egress in the Cloud is Expensive.

To learn more about right approach to cloud data migrations and data management visit: Smart Data Migration.

The Benefits of Cloud Native Access

Cloud native is a way to move data to the cloud without lock in, which means that your data is no longer tied to the file system Komprise-Cloud-Native-Access-Webinar-blog-SOCIAL-1-768x402from which it was originally served.

In this webinar, Komprise leaders review the importance of cloud native data access and maximizing the potential of your data in terms of access, efficiency and data services. When you move data in cloud native format, your users should be able to access the data not only as a file, but also as a native object—which is necessary for leveraging cloud-native analytics and other services. Access to your data should not have to go through your file storage layer, as this incurs licensing fees and requires adequate capacity.

Read the blog post: Why Cloud Native Unstructured Data Access Matters

Getting Started with Komprise:

Want To Learn More?

Elastic Data Migration

What is Elastic Data Migration?

Data migration is the process of moving data (eg files, objects) from one storage environment to another, but Elastic Data Migration is a high-performance migration solution from Komprise using a parallelized, multi-processing, multi-threaded approach that speeds NAS-to-NAS and NAS-to-cloud migrations in a fraction of the traditional time and cost.

Standard Data Migration

  • NAS Data Migration – move files from a Network Attached Storage (NAS) to another NAS. The NAS environments may be on-premises or in the cloud (Cloud NAS)
  • S3 Data Migration – move objects from an object storage or cloud to another object storage or cloud

Data migrations can occur over a local network (LAN) or when going to the cloud over the internet (WAN). As a result, migrations can be impacted by network latencies and network outages.

Data migration software needs to address these issues to make data migrations efficient, reliable, and simple, especially when dealing with NAS and S3 data since these data sizes can be in petabytes and involve billions of files.

Komprise-Smart-Data-Migration-Webinar-SOCIAL-ONDEMAND-768x402

Elastic Data Migration

Elastic Data Migration makes its orders of magnitude faster than normal data migrations. It leverages parallelism at multiple levels to deliver 27 times faster performance than NFS alternatives and 25 times faster for SMB protocol performance.

  • Parallelism of the Komprise scale-out architecture – Komprise distributes the data migration work across multiple Komprise Observer VMs so they run in parallel.
  • Parallelism of sources – When migrating multiple shares, Komprise breaks them up across multiple Observers to leverage the inherent parallelism of the sources
  • Parallelism of data set – Komprise optimizes for all the inherent parallelism available in the data set across multiple directories, folders, etc to speed up data migrations
  • Big files vs small files – Komprise analyzes the data set before migrating it so it learns from the nature of the data – if the data set has a lot of small files, Komprise adjusts its migration approach to reduce the overhead of moving small files. This AI driven approach delivers greater speeds without human intervention.
  • Protocol level optimizations – Komprise optimizes data at the protocol level (eg NFS, SMB) so the chattiness of the protocol can be minimized

All of these improvements deliver substantially higher performance than standard data migration. When an enterprise is looking to migrate large production data sets quickly, without errors, and without disruption to user productivity, Komprise Elastic Data Migration delivers a fast, reliable, and cost-efficient migration solution.

ElasticDMarchitecture

Komprise Elastic Data Migration Architecture

What Elastic Data Migration for NAS and Cloud provides

Komprise Elastic Data Migration provides high-performance data migration at scale, solving critical issues that IT professionals face with these migrations. Komprise makes it possible to easily run, monitor, and manage hundreds of migrations simultaneously. Unlike most other migration utilities, Komprise also provides analytics along with migration to provide insight into the data being migrated, which allows for better migration planning.

PttC_pagebanner-2048x639

Fast, painless file and object migrations with parallelized, optimized data migration:

  • Parallelism at every level:
    • Leverages parallelism of storage, data hierarchy and files
    • High performance multi-threading and automatic division of a migration task across machines
  • Network efficient: Adjusts for high-latency networks by reducing round trips
  • Protocol efficient: optimized NFS handling to eliminate unnecessary protocol chatter
  • High Fidelity: Does MD5 checksums of each file to ensure full integrity of data transfer
  • Intuitive Dashboards and API: Manage hundreds of migrations seamlessly with intuitive UI and API
  • Greater speed and reliability
  • Analytics with migration for data insights
  • Ongoing value

komprise-elastic-data-migration-page-promo-1536x349

Getting Started with Komprise:

Want To Learn More?

FabricPool

What is NetApp FabricPool?

FabricPool is a NetApp storage technology that enables automated tiering of data from an all-flash appliance to low-cost object storage tiers either on or off premises. This technology is a form of storage pools which are collections of storage volumes exported to a shared storage environment.

Read more about storage pools.

Read the blog post: What you need to know before jumping into the cloud tiering pool

Komprise_CloudTieringPool_blogthumb-768x512

Download the white paper: Cloud Tiering: Storage-Based vs Gateways vs File-Based: Which is Better and Why?

Learn more about the Komprise path to the cloud for file and object data.

Getting Started with Komprise:

Want To Learn More?

File Archiving

File archiving is the process of preserving digital files for long-term data storage and retrieval. The goal of file archiving is to retain important files and documents in a secure, easily accessible, and cost-effective manner, while freeing up space on primary storage systems.

Manual file data management, backup and restore solutions, and dedicated file archiving systems are three ways to archive files. Manual file management moves files to a secondary storage location, such as a network share or external hard drive. Backup and restore solutions preserve files by creating snapshots of the data at regular intervals; snapshots can restore data in the event of data loss or corruption. Dedicated file archiving systems are specialized software solutions that are designed specifically for file archiving and provide features such as indexing, searching, and data retention policies.

File Archiving Challenges

File archiving reduces the risk of data loss, improves regulatory compliance, and reduces the costs associated with primary storage. Yet file archiving can present several challenges, including:

  • Data Storage Costs: Storing large volumes of data for a long time can be expensive, especially if the data is stored on traditional storage solutions, such as tapes or hard disk drives.
  • Scalability: As data volumes continue to grow, archiving solutions must be able to meet the increasing demand for storage capacity.
  • Data Retrieval: Archived files are difficult to locate and retrieve if they are not properly indexed or if the index becomes corrupted.
  • Data Retention: Organizations must ensure that their archiving solutions meet regulatory requirements for data retention, including data privacy and security laws.
  • Data Integrity: Archived files must be preserved in their original format and remain readable over time, which requires proper data preservation and data migration strategies.
  • Data migration: As archiving systems age or become obsolete, IT must migrate data to new systems, in particular cloud data migration, which can be time-consuming and complex.
  • Integration with other systems: Archiving solutions must integrate with other systems, such as backup and restore solutions, to ensure streamlined access.

Standards-based Transparent Data Archiving

Thumbnail_600x400_CCC7pitfalls

A true transparent data archiving solution creates literally no disruption, and that’s only achievable with a standards-based approach. Komprise Intelligent Data Management is the only standards-based transparent data archiving solution that uses Transparent Move Technology™ (TMT), which uses symbolic inks instead of proprietary stubs.

True transparency that users won’t notice

When a file is archived using TMT, it’s replaced by a symbolic link, which is a standard file system construct available in NFS, SMB, object store file systems. The symbolic link, which retains the same attributes as the original file, points to the Komprise Cloud File System (KCFS), and when a user clicks on it, the file system on the primary storage forwards the request to KCFS, which maps the file from the secondary storage where the file actually resides. (An eye blink takes longer.) This approach seamlessly bridges file and object storage systems so files can be archived to highly cost-efficient object-based solutions without losing file access.

Learn more about Komprise TMT for File Archiving

Getting Started with Komprise:

Want To Learn More?

File Data Management

File data management is the process of organizing, storing, and retrieving digital files in an efficient and secure manner. This can include tasks such as:

  • Naming files in a consistent and descriptive manner
  • Creating folders and sub-folders to categorize and store files
  • Regularly backing up important files to prevent data loss
  • Purging old or unnecessary files to free up storage space
  • Using appropriate software tools to manage, search and retrieve files

Effective file data management helps improve productivity and organization, and reduces the risk of data loss or corruption. It is a critical aspect of overall data management, especially in businesses and organizations where large amounts of data are generated and stored on a regular basis.

File Data Management Challenges

Because we’re talking about unstructured data, file data management can present a number of challenges, including:

  • Data Growth: As more and more data is generated and stored, it can become difficult to manage and organize effectively. The majority is unstructured data.
  • Data Duplication: Duplicate files can lead to confusion, waste storage space and make it harder to find the most up-to-date version of a file.
  • Data Security: Protecting sensitive information from unauthorized access or cyberattacks is a major concern in file data management. (Read about cyber resiliency and saving on ransomware production.)
  • Data Loss: Accidentally deleting or losing files can result in significant data loss and potential productivity loss.
  • Compliance: Certain industries and organizations may have regulatory requirements for file data management, such as retention policies and data privacy laws.
  • Integration with Other Systems: Integrating file data management systems with other applications, such as email, CRM, and collaboration platforms, can be complex and time-consuming.
  • Scalability: As the amount of data grows, the file data management system must be able to scale to meet the demands of the organization.
  • Compatibility: Ensuring that files can be opened and used by multiple users and systems can be a challenge, especially with different file formats and software versions.

These challenges can be addressed through the use of appropriate software tools, best practices for file data management, and regular reviews and updates to the file data management policies.

Komprise_ArchitectureOverview_WhitePaperthumbKomprise File Data Management

Komprise Intelligent Data Management has been designed from the ground-up to simplify file data management and put customers in control of unstructured data, no matter where data lives. Analytics-first approach, Komprise works across file and object storage, across cloud and on-premises, and across data storage and data backup architectures to deliver a consistent way to manage data. With Komprise you get instant insight into all of your unstructured data—wherever it resides. See patterns, make decisions, make moves, and save money—all without compromising user access to any data. Komprise puts you in control of your data while simplifying file data management by creating a lightweight management plane across all your data storage silos without getting in the path of data access.

Block vs File Level Data Storage Tiering

A primary file data management technique is data tiering. Here is a summary of block-level versus file-level tiering and the impact. Also download the whitepaper and learn more about Komprise Transparent Move Technology (TMT).

block_file_tiering

Getting Started with Komprise:

Want To Learn More?

File Data Ransomware

What is File Data Ransomware?

This is a ransomware attack targeting file data. 

File data can be generated from users as well as machines. From genomics and medical imaging, streaming video, electric car data, and IoT products, all industries are generating vast amounts of unstructured file data, and increasingly enterprises are migrating file workloads to the cloud. File data can be petabytes of data and billions of files, so migrating this much unstructured data to the cloud takes time and can be disruptive. Cloud data migrations require proper planning to ensure minimal disruption and unintended costs.

There is a growing recognition in the importance of having a layered protection strategy in place against potential file data ransomware attacks. Upwards of 80% of data today is unstructured file data, so IT organizations cannot afford to leave file data unprotected from ransomware. Early detection of ransomware will deliver the best outcome, but ransomware attacks are constantly evolving. Detection is not always foolproof and can be difficult. Investing in ways to recover data if you do get attacked by ransomware and establishing an immutable copy of data in a separate location separate from data storage and backups is the best way to recover data in the event of a ransomware attack. 

But keeping multiple copies of data can get prohibitively expensive. Read the blog: How to Protect File Data from Ransomware at 80% Lower Cost

Learn more about Komprise for cyber resiliency, including optimizing your defenses against cyber incidents, system failure and file data.

What is File Data Ransomware?

Ransomware is an attack by malware that holds your data files hostage by encrypting your systems and making your data inaccessible to you.  The majority of enterprise data in the enterprise is unstructured file data, which means organizations cannot afford to leave file data unprotected from ransomware. While the primary target for ransomware is file data, as the attacks grow more sophisticated hackers are seeking to defeat backups and snapshots.

How to recover your ransomware encrypted data files

The way to recover from a ransomware attack is to establish an immutable copy of your data in a separate location, ensuring it is separate from your data storage. Immutable storage can be physically “air gapped” with offline media such as tape or virtually air gapped with technologies such as AWS S3 object lock that prevent any modification of data even by administrators for a set retention period.

How long does it take to recover from a ransomware attack?

A critical component often overlooked is how long the ransomware recovery can take – if your business can’t resume until data is restored, every minute adds to the cost of the ransomware attack. Recovery from a ransomware attack is equivalent to a disaster where potentially 100% of your data must be restored. Having a tested recovery plan in place is essential to a successful recovery.

How do you protect file data from ransomware?

There are two components of ransomware protection: detection and recovery. Early detection of ransomware will deliver the best outcome, but this is not always foolproof and can be difficult. Organizations should also invest in data recovery strategies and create an immutable copy of data in a separate location data storage and backups in the event of a ransomware attack. But keeping multiple copies of data can get prohibitively expensive. To protect file data from ransomware, the solution must: – Be cost-effective – Protect if backups and snapshots are infected – Provide simple recovery without significant upfront investment – Be verifiable.

Getting Started with Komprise:

Want To Learn More?

File Data Tiering

File data tiering is a data storage management technique that automatically moves files from one storage tier to another based on usage patterns and access frequency. The goal of file data tiering is to optimize storage utilization and reduce storage costs by placing frequently used files on high-performance storage and less frequently used files (cold data storage) on lower-performance storage.

Hardware-based tiering, software-based tiering, and cloud-based tiering are three methods of file data tiering. Hardware-based tiering moves files between different types of physical storage devices, such as solid-state drives (SSDs) and hard disk drives (HDDs), within a storage array. Software-based tiering moves files between different types of virtual storage volumes, such as high-performance and low-performance storage pools. Cloud-based tiering moves files between different storage classes within a cloud-based object storage service, such as Amazon S3.

As part of a broader file data management strategy, file data tiering can help organizations improve storage utilization, reduce storage costs, and increase storage performance by automatically placing the right data in the right place at the right time. However, it’s important for organizations to carefully consider their storage requirements and choose a file tiering solution that fits their needs, as not all tiering solutions are appropriate for all environments.

File-Level Tiering vs Block-Level Tiering

Learn the difference between storage-centric block tiering, which moves blocks that can no longer be directly accessed from their new location without vendor software (aka lock-in) and file data tiering, which is what Komprise uses to fully preserve file access at each tier by keeping the metadata and file attributes with the file—no matter where it lives. Know the difference to make the right cloud tiering choice for your data storage moves.block_file_tiering

Getting Started with Komprise:

Want To Learn More?

FinOps (or Cloud FinOps)

FinOps (or Cloud FinOps) means financial operations that include practices such as cost optimization, cost allocation, chargeback and showback, and cloud financial governance. Some of the key challenges that organizations face with regards to cloud costs include:

  • Cost visibility: Many organizations struggle to gain complete visibility into their cloud costs, which can make it difficult to ensure that they are not overspending on resources.
  • Cost optimization: Organizations need to optimize their cloud costs by reducing waste, optimizing resource utilization, and ensuring that they are only paying for what they need.
  • Cost allocation: Organizations need to allocate their cloud costs so that they are charged in a way that accurately reflects the resources that they are consuming.
  • Cloud financial governance: Governance processes and controls can ensure that cloud spending is aligned with their overall business goals and objectives.

Overall, FinOps is a critical aspect of modern cloud management, and is essential for organizations that want to effectively manage their cloud costs and ensure that they are maximizing value and ROI from their cloud investments.

There are several vendors that specialize in FinOps solutions for cloud cost management and cloud cost optimization, but increasingly FinOps is built into other applications and technology platforms:

  • Apptio
  • CloudHealth by VMware
  • RightScale (acquired by Flexera)
  • CloudCheckr
  • Azure Cost Management + Billing by Microsoft
  • AWS Cost Explorer by Amazon Web Services
  • Cloudability
  • ParkMyCloud

With the right Cloud FinOps strategy, organizations should focus on gaining the tools and expertise they need to manage their cloud costs and ensure that they are getting the most value from their cloud investments.

FinOps and Unstructured Data Management

How much does it cost to own your data?

Cost modeling in Komprise helps IT teams enter their actual data storage costs to determine upfront new projected costs and benefits before spending money on storage. (Know First)

Look at your current (and future) data storage platform(s). Does the company pay per GB (OPEX) or is it an owned technology (CAPEX)? For the latter, divide the current total amount of actual usable data by the cost to acquire the full system to attain cost/TB. For example, 1PB of physical storage may end up being just 500TB of actual usable capacity but only has 300TB of actual useable data on it. Use the 300TB because that is representative of today’s data ownership cost.

Data ownership should also include the cost of data protection (data backup, disaster recovery, etc.). The FinOps capabilities in Komprise Intelligent Data Management allow you to compare on-premises versus cloud models or factor in cloud tiering or migrating to a new NAS platform.

Komprise-Analysis-Only-WP-graphic-4
Komprise Cost Models

According to GigaOm’s 2022 Data Migration Radar Report: Komprise has, “the best set of Financial Operations (FinOps) features to date.”

Stop overspending on cloud storage: Know First. Move Smart. Take Control with the right FinOps for cloud data storage and data management strategy.

Getting Started with Komprise:

Want To Learn More?

Flash Storage

Flash storage is storage media intended to electronically secure data, which can be electronically erased and reprogrammed. The other advantage is it responds faster than a traditional disc, increasing performance.

With the increasing volume of stored unstructured data from the growth of mobility and Internet of Things (IoT), organizations are challenged with both storing data and the opportunities it brings. Disk drives can be too slow, due to the speed limitations. For stored data to have real value, businesses must be able to quickly access and process that data to extract actionable information.

Flash storage has a number of advantages over alternative storage technologies
  • Greater performance. This leads to agility, innovation, and improved experience for the users accessing the data – delivering real insight to an organization
  • Reliability. With no moving parts, Flash has higher uptime due to no moving parts. A well-built all-flash array can last between 7-10 years.

While Flash storage can offer a great improvement for organizations, it is still too expensive as a place to store all data. Flash storage has been about twenty times more expensive per gigabyte than spinning disk storage over the past seven years. Many enterprises are looking at a tiered model with high-performance flash for hot data and cheap, deep object or cloud storage for cold data.

Getting Started with Komprise:

Want To Learn More?

Global File Index

What is a Global File Index?

Komprise Deep Analytics enables precise unstructured data management at enterprise scale, creating a Global File Index, which is a metadata catalog, delivering the benefits of Global Namespace or Global File System data access without sitting in front of the hot data path. Spanning petabytes of file and object data sources, the Global File Index allows enterprise customers to find specific data sets and then create a data management plan to systematically take action on your data set. Unstructured data ends up in multiple silos, so an index needs to be global across different data centers, storage, backup and cloud infrastructure and it must not sit in front of the hot data path to ensure there is no impact on data storage performance.

Once you connect Komprise to your file and object storage, your data is indexed and a Global File Index, which is a global metadata catalog across disparate file and object data, is created. You do not have to move the data anywhere; but you now have a single way to query and search across your file and object stores. Say you have some NetApp, some Isilon, some Windows servers, some Pure Storage at different sites and you have some cloud file storage on AWS, Azure, and Google. You get a single index via Komprise of all the data across all these environments and now you can search and find exactly the data you need with a single console and API.

Komprise-Global-File-Index-2048x976

Benefits of the Global File Index

  • Users only move the data they need, with the ability to create queries on countless file attributes and tags such as: data related to a specific tag or project name, projects that are no longer active, file age, user/group ID’s, path, file type (aka JPEG) and specific extensions, data with unknown owners.
  • A global metadata catalog eliminates the manual effort of finding custom data sets and moving them separately from different storage silos since Komprise can create a virtual data set based on the query and systematically and continuously move data from multiple file and object silos to the target location.
  • Improves IT and business collaboration around data, as data owners/users can participate in data tiering. 

Watch the TechKrunch session: Deep Analytics Actions with One Global File IndexTechKrunch-Nov10

Search and Act on Unstructured Data Insights

Deep Analytics Actions provides a systematic way to find specific file and object data across hybrid cloud storage silos and move just the right subset of unstructured data for new uses such as AI/ML and cloud analytics. This gives IT and storage departments the ability to drive closer connections with end users by liberating the nuggets of useful data from petabytes of files, so that new value and customer-facing benefits can be discovered.

Smart Data Workflows take Deep Analytics Actions a step further by allowing IT users and/or storage admins to create automated workflows for all the steps required to find the right unstructured data across storage assets, tag and enrich the data and send it to external tools for analysis. This eliminates manual effort in unstructured data management and helps organizations speed time to value from cloud-native and other tools.

Fall-2022-Product-Launch-blog-SOCIAL-768x402

Getting Started with Komprise:

Want To Learn More?

Global File System

White-paper-Global-Namespace-vs-Global-File-System_-Linkedin-Social-1200px-x-628pxA global file system, often referred to as a global distributed file system or a global namespace file system, is a type of file system that allows for the unified management and access of files and data across a distributed or networked environment. The goal is to abstract the physical location of files and provides a single, logical view of data regardless of where it is stored or how the storage is distributed. The concept of a global file system is commonly discussed in enterprise environments and cloud computing as a way to simplify data management, primarily unstructured data management, and improve accessibility.

Common features and characteristics of global file systems

  • Unified Namespace: A global file system provides a single, unified namespace that abstracts the underlying storage infrastructure. Users and applications access files and data using a consistent naming convention, irrespective of the physical storage location.
  • Distributed Data: Data within a global file system can be distributed across multiple storage devices, servers, data centers, or cloud services. This distribution can improve data availability, scalability, and fault tolerance.
  • Access Transparency: Users and applications can access files and data without needing to know the physical location or storage details. This access transparency simplifies data access and management.
  • Data Replication: Global file systems often support data replication to enhance data availability and redundancy. Copies of data can be stored in multiple locations for failover and disaster recovery purposes.
  • Scalability: These file systems are designed to scale horizontally, allowing for the addition of storage devices or nodes to accommodate growing data requirements. A key issue with most so-called global file system or global namespace solution is that they sit in front of the hot data and become a data access and data performance bottleneck.
  • Load Balancing: Load balancing mechanisms distribute data access requests across multiple servers or storage devices to optimize performance and prevent bottlenecks.
  • Security: Security features, such as access controls, encryption, and authentication, are typically implemented to protect data within the global file system.
  • Caching: Caching mechanisms can be employed to improve read and write performance by temporarily storing frequently accessed data in memory.
  • Metadata Management: Metadata about files, such as file attributes, permissions, and access control lists, is managed centrally to ensure consistency.
  • Versioning: Some global file systems support versioning, allowing users to access and restore previous versions of files.
  • File Locking: File locking mechanisms may be implemented to prevent conflicts when multiple users or applications access the same file simultaneously.
  • Compatibility: Global file systems are often designed to be compatible with various operating systems, file protocols, and APIs, making them versatile in heterogeneous environments.

Examples of global file systems and distributed file systems

  • NFS (Network File System): NFSv4 and NFSv4.1 support a global namespace, enabling clients to access files across a network as if they were on a local file system.
  • Ceph: Ceph is an open-source distributed storage platform that provides a global file system called CephFS, offering a unified namespace for object storage and block storage.
  • GlusterFS: GlusterFS is a distributed file system that creates a single global namespace from multiple underlying storage servers.
  • Amazon Elastic File System (EFS): EFS is a cloud-based global file system service provided by Amazon Web Services (AWS) that allows multiple Amazon EC2 instances to access shared file storage.

The promise of a global file system is to simplify data management in modern, distributed computing environments, making it easier for organizations to store, access, and manage their data resources efficiently and consistently across the network.

Global File System: Always in the Hot Data Path

A global file system provides a consistent way to access the data or metadata residing in that file system from many locations, and where multiple users in different locations may be working on copies of the same file. It also provides a consistent way to access, configure and administer the file system. The two types of global file systems are:

  • Storage-centric: Stores the data and provides access to it using a single mount that fronts all data requests and is always in the hot data path. By “fronts all data” we mean that all data and metadata request are channeled through this mount. Some vendors extend this notion to keep the bulk of the data as proprietary blocks in the cloud. In this case of “cloud storage” GFS, you need to recognize that access to your data always requires licensing the GFS even when the bulk of your data may be in the cloud, which may unnecessarily add costs. A storage-centric GFS does not provide a truly global namespace. It can only provide visibility into data residing on that vendor’s storage system.
  • Metadata-based: Also known as a virtual global file system, this approach fronts data sitting on other storage systems. All data and metadata access is channeled through this virtual global file system, which runs in front of existing storage file systems. The benefit of this approach is that it works across multiple storage vendors. However, there is a heavy price for this as all access must pass through the metadata-based controller, which slows down performance if it is implemented fully in software or increases costs substantially if it requires dedicated hardware. This is because it is in the hot data path and manages data access even though it is not storing any data blocks. A metadata-centric GFS can provide a global namespace across multi-vendor storage systems, but it must do so by fronting all data access, which will negatively impact performance and scalability.

A global file system enhances the inherent value of a storage solution when employees need to actively collaborate in use cases such as engineering collaboration and design. But since 80% of data is cold and not actively accessed, and since typically less than 5% of data requires active collaboration, for unstructured data management, data tiering and feeding data to AI/ML, a global namespace that is not in the hot data path gives truly heterogeneous visibility with the best performance.

Learn more about the Komprise Deep Analytics, the Global File Index and the Komprise Intelligent Data Management platform architecture.

Komprise-Global-File-Index-Architecture-300x143

Getting Started with Komprise:

Want To Learn More?

Global Namespace

A global namespace is a concept used in many fields of computer science and IT to describe a unified and consistent naming system for resources that can be accessed from multiple locations or contexts within a distributed computing environment. While the idea makes sense, too often the technology solutions available on the market today sit directly in the hot data path, causing performance bottlenecks. It’s important to step back and assess the goals for a global namespace and choose your technology solution carefully. The primary purpose of a global namespace is to provide a way to access and manage resources, such as files, directories, objects, or services, in a manner that abstracts their physical location or distribution across a network. This abstraction should simplify resource management and allows for scalability and flexibility in distributed systems.

A Global Namespace does not need a Global File System. Read the whitepaper.
White-paper-Global-Namespace-vs-Global-File-System_-Linkedin-Social-1200px-x-628px

Learn more about the Komprise architecture and how the Intelligent Data Management platform never sits in front of the hot data path.

Key aspects of a global namespace:

Be clear on the objective before embarking on this journey towards a global namespace, also known as a universal file system. Also be sure to learn more about the Komprise Global File Index as an alternative that ensures there is no user or application disruption and does not sit in front of the hot data and impact performance. You can think of Komprise as a federated global namepace – visibility, mobility, value. Key aspects of a the global namespace strategy have historically included:

  • Resource Abstraction: A global namespace abstracts the physical or logical location of resources, making them appear as if they are part of a single, unified namespace. This abstraction allows users and applications to access resources without needing to know where they are physically located. Learn more about Komprise Transparent Move Technology.
  • Scalability: Global namespaces musb be able to scale as the number of resources and the size of the distributed system grow. New resources can be added to the namespace without disrupting existing operations.
  • Consistency: A global namespace enforces naming conventions and consistency across the distributed environment. This ensures that resources have unique names and that naming conflicts are minimized.
  • Access Transparency: Users and applications can access resources in the global namespace using a consistent naming convention, regardless of whether the resource is located on the local system or a remote system. This transparency simplifies resource access.
  • Location Transparency: Location transparency means that users and applications don’t need to be aware of the physical location of resources. The global namespace is meant to provide a level of indirection that allows the system to route requests to the appropriate location.
  • Distribution: Resources in a global namespace should be distributed across multiple servers, data centers, or cloud environments. The namespace management system handles resource distribution and location details.
  • Security: Global namespace systems often include access control and authentication mechanisms to ensure that only authorized users or applications can access resources.

Examples of global namespaces in different contexts:

  • File Systems: Distributed file systems like the Server Message Block (SMB) Common Internet File System (CIFS) and the Network File System (NFS) provide a global namespace for accessing files and directories across a network.
  • Object Storage: Cloud-based object storage services like Amazon S3 and Azure Blob Storage offer a global namespace for storing and accessing objects (e.g., images, documents) via unique object keys.
  • Distributed Databases: Distributed databases may use global namespaces to abstract the location and naming of data tables and records across multiple database nodes.
  • Service Discovery: In microservices architectures, global namespaces can be used for service discovery, allowing applications to locate and communicate with services across a distributed environment.

Global namespaces are a core concept in the design of distributed and scalable computing systems. They goal is to simplify resource management and access in complex distributed environments, making it easier for users and applications to interact with resources across the network seamlessly. Some of the known challenges of the concept of a global namespace include:

  • Scalability: As the number of resources and the size of the distributed system grow, managing a global namespace becomes increasingly complex. Ensuring that namespace operations remain efficient and do not become a bottleneck can be challenging.
  • Consistency: Maintaining consistency across a global namespace can be difficult, especially in distributed systems where multiple copies of data or resources may exist. Ensuring that all clients see a consistent view of the namespace, even in the presence of concurrent updates, is a challenge.
  • Concurrency Control: Dealing with concurrent access and updates to the global namespace can lead to conflicts and synchronization issues. Implementing effective concurrency control mechanisms is essential to prevent data corruption and maintain data integrity.
  • Security: Ensuring the security of resources and access control in a global namespace can be complex. Controlling who can access and modify resources, especially in a distributed and potentially untrusted environment, requires robust security measures.
  • Data Distribution: In distributed systems, resources may be distributed across various physical locations or data centers. Ensuring that data is distributed optimally for performance and availability while maintaining a consistent namespace view is a challenge.
  • Data Migration: Moving data or resources within a global namespace, especially in response to changes in the system’s topology or resource allocation, can be challenging. Data migration needs to be seamless and transparent to users and applications.
  • Fault Tolerance: Global namespaces must be designed to be fault-tolerant. When network failures, server crashes, or other issues occur, the namespace should continue to function correctly and without data loss.
  • Network Latency: In distributed environments, network latency can impact the performance of namespace operations. Minimizing the impact of latency on user experience is a challenge.
  • Naming Conflicts: Ensuring that resource names within a global namespace are unique and avoiding naming conflicts can be challenging, especially in large-scale distributed systems with many users and applications.
  • Versioning and Compatibility: Managing versioning and ensuring backward compatibility of the global namespace protocol as it evolves over time can be complex, particularly in heterogeneous environments where various protocol versions may coexist.
  • Monitoring and Diagnostics: Debugging issues in a global namespace, monitoring its health and performance, and diagnosing problems can be challenging due to the distributed and abstract nature of the namespace.
  • Compliance and Regulations: Ensuring that the global namespace complies with legal and regulatory requirements, such as data privacy and data retention policies, can be complex and may require specific features or controls.

To address these challenges, organizations often rely on advanced distributed file systems, object storage systems, namespace management solutions and unstructured data management solutions. These solutions must be designed to provide scalability, consistency, security, and fault tolerance in global namespaces, making them suitable for various use cases, including cloud storage, content delivery, and data sharing in distributed environments.

Benefits of a Global Namespace without a Global File System

As unstructured data continues to pile up across both data center and cloud silos, it’s easy to see the appeal of a single way to access and manage data no matter where it lives. Imagine having one place to get visibility into data across all your silos, identify hot and cold data, and plan and execute data migrations and data tiering across all your storage and cloud locations? And what if this same system allowed your users to search for relevant data across storage silos and feed AI/ML pipelines and create automated data workflows?

These are the many advantages of a global namespace for enterprise data storage. As unstructured data volumes continue to expand exponentially, data silos proliferate and IT budgets remain relatively flat, many organizations are interested in the data management benefits of a global namespace. However, it’s important to note that a global namespace does not require a global file system (GFS), despite vendors often claiming this to be the case. A global file system sits in front of the data and serves the appropriate files, thus acting as a controller. While a GFS is useful in some collaboration scenarios, using it to achieve the management benefits of a global namespace creates unnecessary overhead that results in loss of data control, loss of flexibility, poor visibility, poor performance and high costs. It is important to understand the different approaches and goals and ideal use cases for each. Komprise provides the data visibility, access and cost management benefits of a global namespace – all without the overhead of a global file system.

Questions to Help You Determine What’s Best: Global Namespace and/or Global File System

1. Do you want to:

  • A) Replace your existing NAS with something new.
  • B) Leverage our existing investments and modernize our infrastructure.

If the answer is A) a storage-centric global file system might be the solution. Be sure to not only focus on switching costs, but also the long-term costs and implications of having a new storage technology platform sitting in front of (and hosting) all your data.

If the answer is B) Komprise can help with analytics-driven data migration, management and mobility

2. Are you trying to:

  • A) Collaborate across teams and locations?
  • B) Improve the ability to view and manage data across systems?

If the answer is A) a metadata-based GFS might be the solution. Be sure to determine the importance of collaboration use cases and the ongoing costs of fronting all of your data storage.

If the answer is B) Komprise can help provide the benefits of a global namespace without sitting in the hot data path.

3. Do you anticipate needing multiple users to collaborate on large files across multiple locations, requiring local caching?

If the answer is YES, you will need a GFS. If the answer is NO, you want the visibility benefits of a global namespace and do not want or need the overhead of a global file system. Komprise Intelligent Data Management might be the right solution.

Learn more about the Komprise Global File Index – the benefits of a federated global namespace that never sits in front of the hot data path.

Getting Started with Komprise:

Want To Learn More?

Google Cloud Platform (GCP)

What is Google Cloud Platform?

Google Cloud Platform (GCP) is a suite of cloud computing services provided by Google. It offers a wide range of infrastructure and platform services, including computing, storage, networking, big data, machine learning, and security. Some of the key services offered by GCP include:

  • Compute Engine – Virtual Machines (VMs) that can be used to run applications and services.
  • App Engine – A platform for building and deploying web and mobile applications.
  • Kubernetes Engine – A managed service for deploying, scaling, and managing containerized applications.
  • Cloud Storage – A scalable and durable object storage service.
  • Cloud SQL – A managed relational database service.
  • BigQuery – A serverless, fully managed data warehouse for analytics.
  • Cloud Pub/Sub – A messaging and streaming service for real-time data processing.
  • Cloud AI Platform – A suite of machine learning services for building and deploying ML models.

GCP is designed to be highly scalable, reliable, and secure, and it is used by many organizations for a wide range of use cases, from small startups to large enterprises.

Komprise and Google Cloud

Getting Started with Komprise:

Want To Learn More?

Hierarchical Storage Management (HSM)

software, also known as tiered storage, was designed for distributed server
environments to automate the process of identifying cold data sets and automatically migrating them from primary disk to less expensive optical and tape storage devices. Going back to the era of the mainframe, HSM was also supposed to handle file recall requests automatically whenever a user clicked on a stub file.

Unfortunately, these early HSM products (see Wikipedia for a history) suffered from a number of deficiencies such as:

  • They were custom designed for specific proprietary storage systems, which limited hardware choices and resulted in vendor lock-in.
  • Many required file server agents that required substantial memory and compute resources, and operated in the direct data path, impacting performance.
  • They used static stub files left in place of the moved data. These static stub files could be corrupted, deleted, and orphaned making it difficult if not impossible to locate the original source file.
  • The early HSM solutions did not scale well. As file counts increased, HSM performance deteriorated significantly since they were traditional database-driven architectures.
  • The solutions would disrupt storage s
    ystem performance, interrupting active usage.
  • File recalls could take a long time, especially if the requested file was stored on tape.

So bad were these deficiencies, that HSM became a “bad word” amongst IT professionals. Many of those IT pros believed that the only viable way to manage storage was to just keep adding more capacity to the primary tier.

storage-swiss-768x512

As the data center landscape has changed, with organizations having a wide range of data storage options available. Flash memory devices have replaced high performance physical disk drives as Tier-1 storage. High performance and commodity physical hard disks now function as secondary and tertiary storage tiers. Cloud file storage and object storage options are available to handle large bulk, long-term storage requirements. All of these options are needed to combat the unstructured data onslaught (and data sprawl and high data storage costs) that most organizations are facing. However, the main problem remains; how to automatically detect “warm” and “cold” data sets then continuously migrate them to the most cost-effective storage tier while also managing the entire file life cycle. As outlined in this early review of Komprise:

In short, we have more storage options than ever but less intelligence about how and when to move our increasing data to which storage platform.

In a 2022 Blocks and Files review, Komprise Intelligent Data Management is referred to as an HSM or Information Lifecycle Management solution. The new category of software is now known as unstructured data management as well as the broader term: data services.

Getting Started with Komprise:

Want To Learn More?

Hybrid Cloud File Data Services

In February 2023, Gartner industry analyst Julia Palmer published published a research article: Modernize Your File Storage and Data Services for the Hybrid Cloud Future. (Blocks & Files summary). According to Gartner, hybrid cloud file data services provide data access, data movement, life cycle management and data orchestration. Komprise is listed as top vendor in this category. The other two categories included in the note are:

  • Next-generation file platforms: on-premises filers adding hybrid cloud capability and new software-only file services suppliers (VAST Data, NetApp, Qumulo)
  • Hybrid cloud file platforms: providing public cloud-based distributed file services (Ctera, Nasuni, Panzura)

Hybrid cloud file data services refer to the use of file-based storage solutions in a hybrid cloud environment, which combines on-premises infrastructure with cloud resources, allowing organizations to leverage the benefits of both private and public clouds. File data services, in this context, involve the access, management, movement and even storage and retrieval of files and data within a hybrid cloud setup.

data-servicesIn the article: Unstructured Data Growth and AI Give Rise to Data Services, Komprise cofounder and COO Krishna Subramanian summarized the benefits of a data services approach as:

  • Holistic visibility and granular search across multiple storage systems and clouds;
  • Analytics and insights on data types and usage for more accurate storage decisions;
  • Automated, policy-driven actions based on that analysis;
  • Reduced security and compliance risks;
  • Full use of data wherever it is stored, especially in the cloud;
  • User self-service access to support departmental and research needs for data storage, management, and AI workflows;
  • Greater flexibility to adopt new storage, backup, and DR technologies because data is managed independently of any vendor technology.

The article concludes:

Above all, data management and storage infrastructure experts will need to shift their thinking and practices from managing storage technologies to understanding and managing data for a variety of purposes. A data storage and data management infrastructure that supports flexibility and agility to shift with organizational data needs will allow IT to make the shift faster and with better outcomes for all.

What are some of the components and features of hybrid cloud file data services?

White-paper-Global-Namespace-vs-Global-File-System_-Linkedin-Social-1200px-x-628pxStill an emerging category, in the Gartner Top Trends in Enterprise Data Storage 2023 report from Gartner (subscription required), it is note that, “by 2027, 60% of Infrastructure and Operations leaders will implement hybrid cloud file deployments, up from 20% in early 2023.” Hybrid cloud file data services “provide data access and data management across edge, cloud and core data center locations through a single global namespace.” The report goes on to note: “Increasingly, enterprises are creating, ingesting and accessing data in edge locations, factories, field offices and retail locations. The data services to analyze or enhance data are typically present in the public cloud, but the workers who collaborate on the data are spread across many geographic locations, raising the demand for a single global namespace.”

Read the white paper: Global Namespace vs Global File System: What is the Difference and Why Does it Matter?

Some of the components and features of hybrid cloud file data services may include:

Data Storage and Management

  • On-Premises Storage: Traditional file servers or network-attached storage (NAS) devices located within an organization’s physical premises.
  • Cloud Storage: File storage services provided by public cloud providers (e.g., Amazon S3, Azure Blob Storage) for scalable and elastic storage options.

Data Synchronization and Sharing

  • Bidirectional Sync: Ensures that data remains consistent across on-premises and cloud environments, allowing users to seamlessly access and update files from either location.
  • Collaboration Tools: Integration with collaboration platforms to enable efficient sharing and collaboration on files among users in different locations.

In Gartner’s definition of Hybrid Cloud File Platforms, these features would require a Global File System.

Scalability and Flexibility

  • Elastic Scaling: The ability to scale file storage both on-premises and in the cloud based on changing storage requirements.
  • Multi-Cloud Support: Compatibility with multiple cloud providers, giving organizations flexibility in choosing the most suitable cloud services for their needs.

Data Security and Compliance

  • Encryption: Secure transmission and storage of files through encryption mechanisms, ensuring data confidentiality.
  • Compliance Features: Tools and features to help organizations comply with data protection regulations and industry-specific standards.

Data Access and Mobility

  • Global Access: Enable users to access files from any location, promoting a mobile and distributed workforce.
  • Data Mobility: Facilitate seamless movement of data between on-premises and cloud environments.

Learn more about Komprise Intelligent Data Management: Unified data control plane for file and object data analytics, mobility and management without creating a bottleneck and never being in the hot data path.

Backup and Disaster Recovery

  • Snapshot and Backup: Regularly capture snapshots and backups of file data to protect against data loss or corruption.
  • Disaster Recovery Planning: Implement strategies to quickly recover file data in the event of a disaster or data loss incident.

Integrated Management and Monitoring

  • Unified Dashboard: A centralized management interface for overseeing file data services across on-premises and cloud environments.
  • Monitoring Tools: Tools for tracking performance, usage, and potential issues in the hybrid cloud file storage infrastructure.

Implementing hybrid cloud file data services requires careful planning, integration, and management to ensure a seamless and efficient experience for users while maximizing the benefits of both on-premises and cloud-based data storage solutions.

Getting Started with Komprise:

Want To Learn More?

Immutable Storage

What is immutable storage?

Immutable storage is a feature of file storage, or more typically object storage, that protects data from modification or deletion for a set retention period. Immutable storage is often used in highly regulated industries such as finance and health care but is now gaining popularity across other industries as a defense against ransomware or insider threats.

Implementations of immutable storage such as AWS S3 Object lock are certified by independent 3rd parties to ensure they comply with government regulations.

Read the blog post: How to Protect File Data from Ransomware at 80% Lower Cost

Komprise-Ransomware-blog-post-THUMB-1Since approximately 80% of data today is unstructured data, organizations cannot afford to leave file data unprotected from ransomware attacks. Early ransomware detection can deliver the best outcome, but as ransomware attacks are constantly evolving, detection is not always foolproof and can be difficult. Investing in ways to recover data if you do get attacked by ransomware is essential. An immutable copy of data in a separate location separate from your data storage and data backups gives you a way to recover data in the event of a potentially devastating ransomware attack. But keeping multiple copies of data can get prohibitively expensive.

Getting Started with Komprise:

Want To Learn More?

Information Lifecycle Management

Information Lifecycle Management (ILM) is a data management strategy that focuses on managing the flow of data from creation to deletion. The goal of ILM is to optimize the use of storage resources and improve data management efficiency and cost-effectiveness.

Gartner defines ILM this way:

Information Lifecycle Management (ILM) is approach to data and storage management that recognizes that the value of information changes over time and that it must be managed accordingly. ILM seeks to classify data according to its business value and establish policies to migrate and store data on the appropriate storage tier and, ultimately, remove it altogether. ILM has evolved to include upfront initiatives like master data management and compliance.

Source

TechTarget Defines ILM this way:

Information lifecycle management (ILM) is a comprehensive approach to managing an organization’s data and associated metadata, starting with its creation and acquisition through when it becomes obsolete and is deleted.

Source

ILM involves a series of activities that are performed at different stages of the data lifecycle, such as data creation, data storage, data protection, data archiving, and data deletion. At each stage, the data is managed and stored according to its value, importance, and frequency of use.

ILM typically involves the use of data classification, data retention, data archiving policies, and data management tools and technologies. These policies and technologies help to manage the flow of data throughout its lifecycle and ensure that it is stored in the most appropriate location and format for its current needs.

Benefits of implementing ILM

Improved storage utilization and cost savings

By managing data throughout its lifecycle, ILM helps organizations ensure that the most valuable and important data is stored on high-performance storage systems, while less important data is stored on lower-cost storage systems.

Increased data protection and security

By managing the flow of data and applying appropriate data protection and security measures, ILM helps reduce the risk of data loss or corruption.

Better compliance

ILM helps organizations meet regulatory and compliance requirements by ensuring that data is managed and stored in accordance with the organization’s policies and best practices.

Overall, Information Lifecycle Management is an essential aspect of modern data management and is critical to effectively manage and store data securely and with cost savings in mind.

ILM Challenges

  • Complexity: In organizations with large and complex data environments it can be difficult to effectively manage and store data throughout its lifecycle. This can lead to data sprawl, increased data storage costs, and increased security and compliance risks.
  • Cost: Implementing ILM requires investment in the right data management tools and technologies, and structured and unstructured data management policies and processes. This can be a significant cost for organizations, especially those with limited budgets.
  • Data protection and security: ILM can introduce new security and privacy risks, especially if sensitive data is stored on low-cost or low-security storage systems. Organizations should ensure that they have appropriate data protection and security measures in place to mitigate these risks.

By carefully planning and executing your ILM strategies, organizations can manage and store your data throughout its lifecycle, cutting costs while ensuring that data is protected, secure, and compliant with regulatory requirements.

On-going Unstructured Data Management as part of an ILM Strategy

As we noted when we launched Smart Data Workflows, with billions of files and objects, analytics plus continuous mobilization is essential because data has a lifecycle and data management is not a one-time thing. Whether the use case is data analytics, data migration, data tiering, data replication, data search or anything related to the data lifecycle, it is important to look for an unstructured data management solution that delivers on-going data management. Learn more about Komprise Intelligent Data Management.

Getting Started with Komprise:

Want To Learn More?

IOPS

IOPS stands for Input/Output Operations Per Second. It is a commonly used metric to measure the performance or throughput of storage devices, such as hard disk drives (HDDs), solid-state drives (SSDs), or data storage systems.

IOPS represents the number of read and write operations a storage device or system can perform in one second. It is an important metric for determining the responsiveness and efficiency of storage solutions, especially in high-performance or latency-sensitive environments.

The IOPS value can vary significantly depending on factors such as the storage technology, disk capacity, disk speed, queue depth, block size, and workload characteristics.

Key points about IOPS:

  • Random IOPS: Random IOPS refers to the number of random read or write operations a storage device can handle per second. It is a measure of how quickly the storage device can handle small, random data access patterns typically seen in databases or virtualized environments.
  • Sequential IOPS: Sequential IOPS represents the number of sequential read or write operations a storage device can perform per second. It measures the storage device’s ability to handle large, sequential data access patterns, which are common in tasks such as streaming or large file transfers.
  • Queue Depth: The queue depth represents the number of I/O requests that can be queued or outstanding at a given time. A higher queue depth allows for more simultaneous I/O operations, which can increase IOPS performance.
  • Block Size: The block size refers to the size of the data transferred in each I/O operation. Smaller block sizes typically result in higher IOPS values, as more operations can be performed in a given time period. However, larger block sizes can improve throughput and efficiency for certain workloads.

IOPS is just one metric to consider when evaluating storage performance. Other factors like latency, bandwidth, and throughput also play a significant role. Workload characteristics, including read-to-write ratios, access patterns, and the number of concurrent users or applications, should be taken into account to determine the appropriate storage solution for specific use cases.

When comparing storage devices or systems, it is recommended to consider multiple performance metrics, including IOPS, to gain a comprehensive understanding of their capabilities and suitability for a given workload.

Historically, hardware-oriented metrics was how data storage was measured, including:

  • Latency, IOPS and network throughput
  • Uptime and downtime per year
  • RTO: Recovery point objective (time-based measurement of the maximum amount of data loss that is tolerable to an organization)
  • RPO: Recovery time objective (time to restore services after downtime)
  • Backup window: Average time to perform a backup

Read more about the file metrics that matter.

What are the top reports or metrics that data storage people need today to help keep up with these trends? Read: The Critical Role of Reporting in Trimming Storage Costs.

Getting Started with Komprise:

Want To Learn More?

Isilon CloudPools (Dell EMC)

What are Isilon CloudPools?

Smart-Data-Migration-600x600-3

Dell EMC PowerScale (formerly Isilon) CloudPools software provides policy-based automated tiering that allows for an additional storage tier for the Isilon cluster at your data center. This technology is a form of storage pools which are collections of storage volumes that often blend different tiers of storage into a logical pool or shared storage environment.

CloudPools supports tiering data from Dell PowerScale Isilon to public, private or hybrid cloud options. This technology moves archived files to the destination storage in a proprietary format and then references the moved files via stubs. File data access from the object storage is not possible, eliminating the use of cloud-based functions such as AI/ML. Functions such as backup by external application or migration to new storage array require full rehydration of data leading to egress fees from cloud storage and the need to retain on-prem storage capacity.

Learn more about CloudPools.

Read the blog post: What you need to know before jumping into the cloud tiering pool

Komprise_CloudTieringPool_blogthumb

Read the white paper: Cloud Tiering: Storage-Based vs Gateways vs File-Based: Which is Better and Why?

Learn how to save on storage with Dell EMC and Komprise.

Getting Started with Komprise:

Want To Learn More?

Isilon Tiering

The Isilon Tiering solution from Dell EMC is called PowerScale CloudPools.
Dell EMC PowerScale Isilon CloudPools software provides policy-based automated tiering that allows for an additional storage tier for the Isilon cluster at your data center. CloudPools supports tiering data from Dell PowerScale Isilon to public, private or hybrid cloud options. This technology is a form of storage pools, which are collections of storage volumes exported to a shared storage environment.
Cloud tiering and data tiering (or archiving) can deliver significant cost savings as part of a cloud data strategy by offloading unused cold data to more cost-efficient cloud storage solutions. The approach you take to Isilon tiering can either create an easy path to the cloud with native access and full use of data in the cloud or it can create costly cloud egress and lock-in. Array block-level tiering is a mismatch for the cloud. Isilon cloud tiers blocks rather than entire files, which the following ramifications:
  • Limited policies result in more data access from the cloud.
  • Defragmentation of blocks leads to higher cloud costs.
  • Sequential reads lead to higher cloud costs and lower performance.
  • Tiering blocks impacts performance of the storage array.

Read the blog post: What you need to know before jumping into the cloud tiering pool

PowerScale Isilon Tiering Choices

When it comes to considering PowerScale Isilon data tiering and PowerScale Isilon cloud tiering, it’s important to understand your cloud tiering choices. Cloud tiering and archiving can save you millions by offloading infrequently accessed cold data to cost-efficient cloud data storage. But, the approach you take can either create an easy path to the cloud for file data with full use of data in the cloud or it can create costly cloud egress and lock-in.

Smart Migration from PowerScale Isilon with Komprise: Analyze your data first, tier off cold data, deliver 25x faster cloud data migrations and deliver transparency / no disruption to your users and native data access / no storage-vendor lock-in for your file and object data.

PttC_pagebanner-2048x639
Learn more about cloud tiering and your cloud tiering choices.
Learn more about Komprise for Dell EMC.

Getting Started with Komprise:

Want To Learn More?

Komprise Analysis

Komprise Analysis provides strategic insights into unstructured file and object data across your on-premises and cloud
enterprise IT infrastructure:

  • komprise-analysis-overview-white-paper-THUMB-3Analyze across all your NAS, NFS, SMB, dual shares, as well as cloud storage.
  • See how much data you have, how fast it is growing, what is hot/cold.
  • Quickly understand file data types, top users, top groups, top directories.
  • Perform cost/benefit modeling and capacity planning for tiering and data management.

With Komprise Analysis, you quihttps://www.komprise.com/resource/komprise-analysis-overview/ckly gain visibility across storage silos and the cloud to make data-driven decisions. Plan what to migrate, what to tier, and understand the financial impact with an analytics-driven approach to data management and mobility. What if you could significantly reduce your data costs by transparently moving/tiering infrequently used data to less expensive storage? What if you could tier data without disrupting users or applications and feed select data to AI and ML analysis tools to help generate revenue? With Komprise you can know first, move smart, extract value and take control of your unstructured data growth and costs. That’s the power of Intelligent Data Management.

Komprise Analysis is available as a standalone SaaS solution included with Komprise Elastic Data Migration and the full Komprise Intelligent Data Management Platform.

Komprise-Analysis-blog-SOCIAL-1-1

Getting Started with Komprise:

Want To Learn More?

Komprise Deep Analytics

Komprise Deep Analytics delivers granular, flexible search and indexes data in-place across file, object and cloud data storage to build a comprehensive Global File Index (GFI) spanning petabytes of unstructured data.

Komprise Deep Analytics Actions: Add Deep Analytics queries to a plan and operationalize your ability to search and find what you need and when you need it.

Smart Data Workflows: Leverage the GFI metadata catalog for systematic, policy-driven data management actions that can feed your data pipelines.

Komprise-blog-storage-teams-using-deep-analytics-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Komprise Intelligent Data Management

Komprise-Stats-Graphic-OverviewKomprise Intelligent Data Management is the full platform suite from Komprise, which delivers instant insight into data across NAS and Object data storage silos—from on-prem to the edge and across multi-cloud data storage. Identify savings, systemically move cold data transparently without any disruption to optimize costs, get an easier, faster path to the cloud and deliver greater unstructured data value. With Komprise Intelligent Data Management as a service you can analyze, migrate, tier, archive, replicate, and manage data at scale simply and reliably.

Read the solution brief: Why Komprise Intelligent Data Management.

Komprise Intelligent Data Management includes Komprise Analysis, Elastic Data Migration and Deep Analytics. Many enterprise organizations start with Komprise Analysis to know first and then determine the right data mobility and ongoing data management strategy.

Komprise-Architecture-Page-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Metadata Management

Metadata management is the process of collecting, organizing, storing, and maintaining metadata associated with an organization’s data assets. Metadata means data about data – it provides context, structure, and information about various aspects of data, making it easier to understand, manage, and use. Effective metadata management is essential for ensuring data quality, data accuracy, and the right data accessibility across an organization’s enterprise data landscape.

All-About-Metadata-Blog_-Linkedin-Social-1200px-x-628pxTypes of Metadata:

  • Descriptive Metadata: Provides information about the content, structure, and context of data. This includes attributes such as data source, creation date, author, format, and keywords.
  • Technical Metadata: Contains technical details about data, such as data type, data length, field names, and relationships between data elements.
  • Operational Metadata: Tracks the usage and behavior of data within systems, including information about data transformations, processes, and workflows.
  • Business Metadata: Relates data to the business context, such as data definitions, business rules, data ownership, and data lineage.

Benefits of the Metadata Management Strategy:

  • Data Discovery and Understanding: Metadata provides insights into the meaning and structure of data, making it easier for users to discover and understand available data assets.
  • Data Governance: Metadata management supports data governance initiatives by enabling organizations to define and enforce data quality standards, security policies, and compliance requirements.
  • Data Lineage: Understanding the lineage of data – its origin, transformations, and movement – helps ensure data accuracy and traceability, particularly in complex data environments.
  • Data Integration: Metadata helps integration processes by clarifying how different data sources relate to each other, reducing the complexity of integrating disparate data systems.
  • Data Analytics and Reporting: Accurate metadata supports effective data analysis and reporting by providing the necessary context for interpreting results.
  • Search and Discovery: Well-managed metadata enables efficient search and discovery of data, saving time and effort when finding relevant information.
  • Collaboration: Metadata fosters collaboration by providing a common understanding of data across teams and departments.
  • Data Migration and Data Archiving: During data migration or data archiving projects, metadata helps in identifying what data to move, how to transform it, and what to retain for compliance purposes.

Metadata Management Process:

This can be done different across enterprises and industries, but the general components are:

  • Capture: Metadata is collected from various sources, including databases, applications, files, and user input.
  • Store: Metadata can be stored in a centralized metadata repository or catalog. This repository acts as a single source of truth for all metadata assets.
  • Organize: Metadata is organized into categories, taxonomies, or hierarchies to facilitate easy navigation and understanding.
  • Govern: Metadata is governed through established processes, ensuring data quality, accuracy, security, and compliance.
  • Search and Access: Users can search and access metadata using intuitive tools and interfaces, allowing them to find relevant data assets quickly.
  • Update and Maintain: Regularly update and maintain metadata as data assets evolve over time. This includes updating technical details, documenting changes, and managing data lineage.

Metadata Standards and Tools:

Metadata management often involves using standards such as Dublin Core, Metadata Object Description Schema (MODS), and industry-specific standards. Various metadata management tools and platforms are available to facilitate the capture, storage, organization, and retrieval of metadata. Metadata management is a crucial practice for any organization that values data quality, accessibility, and effective data governance. It has now broadened to include unstructured data in order to provide the context necessary to understand and utilize all data assets while supporting critical business initiatives, compliance efforts, analytical and AI activities.

Getting Started with Komprise:

Want To Learn More?

NAS Software

NAS stands for Network Attached Storage, which is a type of data storage architecture that allows multiple devices to access shared storage over a network. NAS software is the software that powers these NAS systems. There are several NAS software options available, from FreeNAS, an open-source NAS software that supports various protocols and features, including CIFS/SMB, NFS, iSCSI, FTP, etc., to enterprise NAS vendors who deliver a combination of NAS software and NAS hardware (Pure Storage, NetApp, HPE, Dell are examples).

The choice of NAS software depends on factors such as the size of your storage needs, budget, features required and personal preferences.

NAS Hardware

NAS hardware is the physical components that make up a Network-Attached Storage (NAS) system. Some of the key components of NAS hardware include:

  • Storage drives: The most important component of any NAS system is the storage drives. These are the hard drives or solid-state drives (SSDs) that store the data. NAS systems typically use multiple drives in a RAID configuration to provide redundancy and improved performance.
  • NAS enclosure: The enclosure is the physical housing that holds the storage drives and other components of the NAS system. Enclosures can vary in size, from small desktop models to large rack-mounted models for enterprise environments.
  • Network interface: The network interface is the component that allows the NAS system to connect to a network. Most NAS systems have a built-in network interface card (NIC) that supports Ethernet connections.
  • Processor and memory: The processor and memory are important components that affect the performance of the NAS system. A powerful processor and sufficient memory can improve the speed and responsiveness of the NAS system.
  • Power supply: The power supply is responsible for providing power to the NAS system. It is important to choose a reliable power supply to ensure that the NAS system operates smoothly.
  • Cooling system: NAS systems generate a lot of heat due to the high-speed operation of the storage drives and other components. A good cooling system is important to prevent overheating and damage to the components.
  • Expansion slots: Some NAS systems have expansion slots that allow you to add additional components, such as network interface cards, to improve the functionality of the system.

Read: Sustainable data management and the future of green business

Enterprise NAS solutions

Pure Storage, NetApp, Dell and Qumulo are all companies that offer enterprise NAS solutions.

  • Pure Storage: Pure Storage offers FlashBlade, a high-performance, scalable NAS solution designed for modern workloads such as analytics, AI, and machine learning. FlashBlade is built on a software-defined architecture and provides features such as data reduction, encryption, and file replication.
  • NetApp: NetApp offers several NAS solutions, including the FAS series and the AFF series. The FAS series provides midrange NAS capabilities and is suitable for small and medium-sized businesses. The AFF series provides high-performance NAS capabilities and is suitable for large enterprises.
  • Dell: Dell offers several NAS solutions, including the PowerVault NX series and the PowerScale series. The PowerVault NX series provides midrange NAS capabilities and is suitable for small and medium-sized businesses. The PowerScale series provides high-performance NAS capabilities and is suitable for large enterprises.
  • Qumulo: Qumulo offers a software-defined NAS solution that can be deployed on-premises, in the cloud, or in a hybrid environment. The solution is designed to provide high-performance file storage for a range of workloads, including video and audio content, medical imaging, and scientific research data.

These are just a few examples of enterprise NAS solutions.

Cloud NAS

Cloud NAS is a type of network-attached storage architecture that allows users to access their data remotely over the internet. There are several cloud NAS vendors in the market that offer cloud-based storage solutions. Some of the well-known cloud NAS vendors include:

  • Amazon Web Services (AWS): Amazon’s cloud computing platform provides several storage services, including Amazon Elastic File System (EFS), which is a cloud-based NAS solution that provides scalable and secure file storage for EC2 instances.
  • Microsoft Azure: Microsoft’s cloud computing platform provides Azure File Storage, which is a fully managed cloud-based NAS solution that supports SMB and NFS protocols.
  • Google Cloud Platform: Google’s cloud computing platform provides Cloud Filestore, which is a cloud-based NAS solution that provides high-performance file storage for compute instances running on Google Cloud Platform.

NAS Migration

NAS migration is the process of transferring data from one NAS system to another. This may be necessary if you are upgrading your existing NAS system, or if you are moving your data to a new location. Also refer to Cloud NAS Migration. Here are the steps involved in a typical NAS migration:

  • five-industry-data-migration-use-cases-blog-SOCIAL-1-768x402Plan the migration: The first step is to plan the migration. This involves identifying the data that needs to be migrated, estimating the size of the data, and choosing the new NAS system.
  • Set up the new NAS system: Once you have chosen the new NAS system, you need to set it up. This involves configuring the network settings, creating shares and volumes, and setting up user accounts and permissions.
  • Copy the data: The next step is to copy the data from the old NAS system to the new one. This can be done using various methods such as using a backup and restore process, using a file transfer protocol such as FTP, or using a third-party tool.
  • Verify the data: After the data has been copied, it is important to verify that all the data has been transferred successfully. This involves checking that all the files and folders have been copied correctly and that there are no missing or corrupted files.
  • Update the clients: Finally, you need to update the clients to point to the new NAS system. This involves updating the client configurations and testing to ensure that the clients can access the data on the new NAS system.

It is important to ensure that you have a backup of all your data before you start the migration process. This will help you to recover your data in case anything goes wrong during the migration process.

Smart-Data-Migration-Blog-SOCIAL-768x402

NAS Migration Challenges

NAS migration can be a complex process and may present a number of challenges. Here are some of the common challenges that organizations may face during NAS migration:

  • Data transfer speed: Moving large amounts of data can be time-consuming, especially if you are using a slow network or if the data is being transferred over a long distance. This can result in prolonged downtime and potential data loss if the migration is not completed within the scheduled downtime window.
  • Compatibility issues: Different NAS systems may have different file systems, protocols, and configurations, which can create compatibility issues during the migration process. This can lead to data corruption or loss, or it may require additional configuration changes to ensure that the data is compatible with the new NAS system.
  • Data loss: Data loss is a common risk during any data migration process, and it is important to have a backup of all your data before you start the migration process. This will help you to recover your data in case anything goes wrong during the migration process.
  • User access: During the migration process, users may lose access to their data, which can result in productivity loss and potential data loss. It is important to plan for user access and ensure that users are informed about any scheduled downtime or access restrictions.
  • Data security: During the migration process, data may be exposed to security risks, such as unauthorized access or data breaches. It is important to ensure that your data is protected throughout the migration process.

To overcome these challenges, it is important to plan the NAS migration process carefully, use appropriate migration tools and services, and involve all stakeholders in the process. It is also important to test the migration process thoroughly before the actual migration to identify and resolve any issues beforehand.

komprise-elastic-data-migration-page-promo-1536x349

Komprise for NAS Migration and Data Management

Komprise specializes in analyzing and tiering, archiving and moving unstructured data from primary NAS to more cost-effective long-term storage without any disruption. Typically, 60% to 80% of enterprise file and object data has not been accessed in over a year. By tiering cold data and older log files and snapshots, the capacity of the storage array, mirrored storage array (if mirroring and/or replication being used) and backup storage is reduced dramatically. The right approach to transparently tiering cold data can reduce overall storage costs by as much as 70%.

With Komprise you can migrate NAS and object data on-premises and in the cloud quickly, reliably, and at scale. Optimize cloud data storage costs with analytics-driven cloud tiering and archival. Build a Global File Index to easily find, tag and take action on the right data at the right time and feed the right data to analytics and AL / ML engines. Komprise uses open standards such as NFS, SMB / CIFS and REST/S3, making it “data storage agnostic.”

Getting Started with Komprise:

Want To Learn More?

Native Data Access

Native Data Access: Having direct access to tiered or archived data without needing rehydration because files are accessed as objects from the target storage.

The Benefits of Cloud Native Data Access

Gartner estimates that by 2025 more than 95% of new digital workloads will be deployed on cloud-native platforms, up from 30% in 2021. According to the 2022 State of Unstructured Data Management report, enterprise IT organizations are looking to optimize data storage efficiency by moving more data to the cloud. As a result cloud NAS file data storage options are attracting attention. In fact, cloud NAS topped the list for storage investments in the 2023 (47%), followed closely by cloud object storage (44%). Enterprise data storage vendors such as NetApp have popular cloud NAS offerings alongside cloud-native offerings such as Amazon FSx and Azure Files. These services are ideal for active or “hot” data requiring high performance and response times; rarely-accessed or “cold” data can live on object storage which delivers significant cost savings for long-term storage.

Read the Blog Post: Why Cloud Native Data Access Matters

As you migrate file workloads to the cloud, it’s important to not limit the potential of your data by locking data into a proprietary format. Cloud native data access is essential to unleash the potential of the cloud. Cloud native is a way to move data to the cloud without lock in, which means that your data is no longer tied to the file system from which it was originally served.

Watch the TechKrunch session: How to Access Tiered Data in the Cloud

This short webinar demonstrates how Komprise allows you to access your stored data wherever it’s stored, whenever you want, without rehydration. Because moved data are always intact, you can extract data value with both file and native access – and without penalty. Read the Komprise Architecture Overview for more information on Native Access.

Getting Started with Komprise:

Want To Learn More?

NetApp Cloud Tiering

The NetApp Cloud Tiering solution is called FabricPool.

FabricPool is a NetApp tiering technology that enables automated tiering of data from an all-flash appliance to low-cost object storage tiers either on or off premises. This technology is a form of storage pools which are collections of storage volumes exported to a shared storage environment.

Cloud tiering and data tiering (or data archiving) can deliver significant data storage cost savings as part of a cloud storage strategy by offloading unused cold data to more cost-efficient cloud storage solutions. The approach you take to NetApp tiering can either create an easy path to the cloud with native access and full use of data in the cloud or it can create costly cloud egress and lock-in.

What you need to know before jumping into the cloud pool.

Learn more about your cloud tiering choices.

Learn more about Komprise for NetApp.PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

NetApp FabricPool

What is NetApp FabricPool? Is it the Right Choice for NetApp Data Tiering?

FabricPool (now called NetApp Cloud Tiering) is a NetApp storage technology that enables automated data tiering at the block level from flash storage to low-cost object storage tiers, in the cloud or on premises. FabricPool is a form of storage pools which are collections of storage volumes that often blend different tiers of storage into a logical pool or shared storage environment.

Originally developed to tier “snapshot” or backup data, the functionality has been extended to infrequently accessed blocks of the active file system. Tiered data is stored in a proprietary format in object storage and as a result can only be read via the original NetApp array. File data access from the object storage is not possible, eliminating the use of cloud-based tools for AI/ML. Additionally functions such as backup by external application or migration to new storage array require full rehydration of data, leading to egress fees from cloud storage and the need to retain sufficient storage capacity on-premises.

Read the white paper, Cloud Tiering: Storage-Based vs Gateways vs File-Based, for more discussion on storage pools.

Array block-level tiering is a mismatch for the cloud. NetApp cloud tiering blocks rather than entire files has the following ramifications:

  • Limited policies result in more data access from the cloud.
  • Defragmentation of blocks leads to higher cloud costs.
  • Sequential reads lead to higher cloud costs and lower performance.
  • Tiering blocks impacts performance of the storage array.

Komprise_CloudTieringPool_blogthumb

Read the blog post: What you need to know before jumping into the cloud tiering pool

Learn more about FabricPool technology.

When it comes to considering NetApp data tiering and NetApp cloud tiering, it’s important to understand your cloud tiering choices. Cloud tiering and archiving can save you millions by offloading infrequently accessed cold data to cost-efficient cloud data storage. But, the approach you take can either create an easy path to the cloud for file data with full use of data in the cloud or it can create costly cloud egress and lock-in. Also, what about cloud data migration and cloud tiering for other storage systems (i.e. Isilon cloud tiering) if you are a multi-storage enterprise IT organization? And what about tiering data from older versions of NetApp? This is why increasingly the market is moving to storage-agnostic unstructured data management.

PttC_pagebanner-2048x639

Learn about Komprise’s native integration with NetApp and why Komprise is the right choice for NetApp cloud data tiering.

Getting Started with Komprise:

Want To Learn More?

Network Attached Storage (NAS)

Diagram_NAS.png

What is Network Attached Storage?

Network Attached Storage (NAS) definition: A NAS system is a storage device connected to a network that allows storage and retrieval of data from a centralized location for authorized network users and heterogeneous clients. These devices generally consist of an engine that implements the file services (NAS device), and one or more devices on which data is stored (NAS drives).

The purpose of a NAS system is to provide a local area network (LAN) with file-based, shared storage in the form of an appliance optimized for quick data storage and retrieval. NAS is a relatively expensive storage option, so it should only be used for hot data that is accessed the most frequently. Many enterprise IT organizations today are looking to migrate NAS and Object data to the cloud to reduce costs improve agility and efficiency.

NAS Storage Benefits

Network attached storage devices are used to remove the responsibility of file serving from other servers on a network and allows for a convenient way to share files among multiple computers. Benefits of dedicated network attached storage include:

  • Faster data access
  • Easy to scale up and expand upon
  • Remote data accessibility
  • Easier administration
  • OS-agnostic compatibility (works with Windows and Apple-based devices)
  • Built-in data security with compatibility for redundant storage arrays
  • Simple configuration and management (typically does not require an IT pro to operate)

NAS File Access Protocols

Network attached storage devices are often capable of communicating in a number of different file access protocols, such as:

Most NAS devices have a flexible range of data storage systems that they’re compatible with, but you should always ensure that your intended device will work with your specific data storage system.

Enterprise NAS Storage Applications

In an enterprise, a NAS array can be used as primary storage for storing unstructured data and as backup for data archiving or disaster recovery (DR). It can also function as an email, media database or print server for a small business. Higher-end NAS devices can hold enough disks to support RAID, a storage technology that allows multiple hard disks into one unit to provide better performance times, redundancy, and high availability.

Data on NAS systems (aka NAS device) is often mirrored (replicated) to another NAS system, and backups or snapshots of the footprint are kept on the NAS for weeks or months. This leads to at least three or more copies of the data being kept on expensive NAS storage. A NAS storage solution does not need to be used for disaster recovery and backup copies as this can be very costly. By finding and data tiering (or data archiving) cold data from NAS, you can eliminate the extra copies of cold data and cut cold data storage costs by over 70%.

Check out our video on NAS storage savings to get a more detailed explanation of how this concept works in practice.

Network Attached Storage (NAS) Data Tiering and Data Archiving

Since NAS storage is typically designed for higher performance and can be expensive, data on NAS is often tiered, archived and moved to less expensive storage classes. NAS vendors offer some basic data tiering at the block-level to provide limited savings on storage costs, but not on backup and DR costs. Unlike the proprietary block-level tiering, file-level tiering or archiving provides a standards-based, non-proprietary solution to maximize savings by moving cold data to cheaper storage solutions. This can be done transparently so users and applications do not see any difference when cold files are archived. Read this white paper to learn more about the differences between file tiering and block tiering.

NAS Migration to the Cloud

smart-data-migrations-icon-circle-300x295Cloud NAS is growing in popularity. But the right approach to migrating unstructured data to the cloud is essential. Unstructured data is everywhere. From genomics and medical imaging to streaming video, electric cars, and IoT products, all sectors generate unstructured file data. Data-heavy enterprises typically have petabytes of file data, which can consist of billions of files scattered across different storage vendors, architectures and locations. And while file data growth is exploding, IT budgets are not. That’s why enterprises’ IT organizations are looking to migrate file workloads to the cloud. However, they face many barriers, which can cause migrations to take weeks to months and require significant manual effort.

Cloud NAS Migration Challenges

Common unstructured data migration challenges include:

  • Billions of files, mostly small: Unstructured data migrations often require moving billions of files, the vast majority of which are small files that have tremendous overhead, causing data transfers to be slow.
  • Chatty protocols: Server message block (SMB) protocol workloads—which can be user data, electronic design automation (EDA) and other multimedia files or corporate shares—are often a challenge since the protocol requires many back-and-forth handshakes which increase traffic over the network.
  • Large WAN latency: Network file protocols are extremely sensitive to high-latency network connections, which are essentially unavoidable in wide area network (WAN) migrations.
  • Limited network bandwidth: Bandwidth is often limited or not always available, causing data transfers to become slow, unreliable and difficult to manage.
Learn more about Komprise Smart Data Migration.

Network Attached Storage FAQ

These are some of the most commonly asked questions we get about network attached storage systems.

How are NAS drives different than typical data storage hardware?

NAS drives are specifically designed for constant 24×7 use with high reliability, built-in vibration mitigation, and optimized for use in RAID setups. Network attached storage systems also benefit from an abundance of health management systems designed to keep them running smoothly for longer than a standard hard drive would.

Which features are the most important ones to have in a NAS device?

The ideal NAS devices have multiple (2+) drive bays, should have hardware-level encryption acceleration, offer support for widely used platforms such as AWS glacier and S3, and have moderately powerful multicore CPU’s with at least 2GB of ram to pair with it.If you’re looking for these types of features, Seagate and Western Digital are some of the most reputable brands in the NAS industry.

Are there any downsides to using NAS storage?

NAS storage systems can be quite expensive when they’re not optimized to contain the right data, but this can be remedied with an analytics-driven NAS data management software, like Komprise Intelligent Data Management.

Using NAS Data Management Tools to Substantially Reduce Storage Costs

komprise-analysis-overview-white-paper-THUMB-3-768x512One of the biggest issues organizations are facing with NAS systems is trouble understanding which data they should be storing on their NAS drives and which should be offloaded to more affordable types of storage. To keep data storage costs lower, an analytics-based NAS data management system can be implemented to give your organization more insight into your NAS data and where it should be optimally stored.

For the thousands of data-centric companies we’ve worked with, most of them needed less than 20% of their total data stored on high-performance NAS drives. With a more thorough understanding of their NAS data, organizations are able to realize that their NAS storage needs may be much lower than they originally thought, leading to substantial storage savings, often greater than 50%, in the long run.

Komprise makes it possible for customers to know their NAS and S3 data usage and growth before buying more storage. Explore your storage scenarios to get a forecast of how much could be saved with the right data management tools.

This is what Komprise Dynamic Data Analytics provides.

NAS Fast Facts:

  • Network-attached storage (NAS) is a type of file computer storage device that provides a local-area network with file-based shared storage. This typically comes in the form of a manufactured computer appliance specialized for this purpose, containing one or more storage devices.
  • Network attached storage devices are used to remove the responsibility of file serving from other servers on a network, and allows for a convenient way to share files among multiple computers. Benefits of dedicated network attached storage include faster data access, easier administration, and simple configuration.
  • In an enterprise, a network attached storage array can be used as primary storage for storing unstructured data, and as backup for archiving or disaster recovery. It can also function as an email, media database or print server for a small business. Higher end network attached storage devices can hold enough disks to support RAID, a storage technology that allows multiple hard disks into one unit to provide better performance times, redundancy, and high availability.
  • Data on NAS systems is often mirrored (replicated) to another NAS system, and backups or snapshots of the footprint are kept on the NAS for weeks or months. This leads to at least three or more copies of the data being kept on expensive NAS devices.

Read the white paper: How to Accelerate NAS Migrations and Cloud Data Migrations 

Know the difference between NAS and Cloud Data Migration vs. Tiering and Archiving

Elastic_DM_NASmigration_2020-FINAL1024_1

Getting Started with Komprise:

Want To Learn More?

NFS Data Migration

NFS protocol data migration refers to the process of transferring data stored in NFS protocol-based systems, such as Unix and Linux file servers, to another system, such as a new file server or a cloud-based storage service. The NFS (Network File System) protocol is a file sharing protocol used by Unix and Linux-based systems to access files and other resources on a server over a network.

Like and data migration, NFS data migrations involves several steps, such as data extraction, data transformation, data loading, data verification, and data archiving. The goal of NFS protocol data migration is to ensure the accurate and secure transfer of data to the new system, while minimizing any disruptions to business operations and preserving the integrity of the data.

Although a less chatty protocol than SMB, NFS data migrations can be challenging due to the complex nature of the NFS protocol and the large volumes of unstructured data that are often involved. To ensure a successful file data migration, organizations typically use specialized tools and services, such as data migration software, cloud data migration services, and managed data migration services.

Komprise delivers 27x faster NFS migrations

27x-Faster-v3-300x281

To address the critical NFS migration issues (and SMB and S3/object protocols) IT faces today, Komprise has developed Elastic Data Migration. This super-fast data migration solution is a highly parallelized, multi-processing, multi-threaded approach that works at two levels:

  • Multi-level Parallelism: Maximizes the use of available resources by exploiting parallelism at multiple levels: shares and volumes, directories, files, and threads to maximize performance. Komprise Elastic Data Migration breaks up each migration task into smaller ones that execute across the Komprise Observers. Komprise Observers are a grid of one or more virtual appliances that run the Komprise Intelligent Data Management solution. All of this parallelism occurs automatically across the grid of Observers. The user simply creates a migration task and can configure the level of parallelism. Komprise does the rest.
  • Protocol-level Optimizations: Reduces the number of round-trips over the protocol during a migration to eliminate unnecessary chatter. Rather than relying on generic NFS clients provided by the underlying operating system, Komprise has fine-tuned the NFS client to minimize overhead and unnecessary back-and-forth messaging. This is especially beneficial when moving data over high-latency networks such as WANs.

Read the Komprise Elastic Data Migration white paper.

Getting Started with Komprise:

Want To Learn More?

Object Data Migration

Object data migration is a type of data migration that supports the movement of object-based data; object data storage uses a flat address space and assigns a unique identifier to each piece of data. There are several factors to consider when planning and executing object data migrations, including:

  • Data compatibility: Organizations need to ensure that the new data storage system is compatible with the existing data and can support the same data formats, protocols, and applications.
  • Data protection: Object data migrations can be complex and lengthy, and organizations need to ensure that their data is protected during the migration process. This may involve using backup and recovery tools, implementing data encryption and other security measures.
  • Performance and scalability: Ensure that the new storage system can meet IT’s performance and scalability requirements.

Planning a Successful Object Storage Data Migration

Komprise has created a number of webinars and blog posts focused on unstructured data migration best practices. While object storage migrations maybe on-premises, this post reviews Tips for a Clean File Data Migration and many of the points are just as relevant for an object data migration:

  • Define Data Storage Sources and Targets
  • Establish Clear Data Migration Rules & Regulations
  • Know Your Unstructured Data: Data Discovery
  • Smart Data Migration. Know Your Topology
  • Before You Migrate Data: Test, Test, Test
  • Understand the Differences Between Free Tools for Cloud Migration vs. Enterprise
  • Have a Good Data Migration Communication Plan
  • Celebrate Wins

General Steps and Considerations for a Successful Object Storage Migration

Assessment and Planning:

  • Understand the existing object storage environment. Identify the data to be migrated, including its size, type, and access patterns.
  • Assess the compatibility between the source and target object storage systems.

Choose the Right Migration Tools:

  • Depending on the size and complexity of the migration, you may use different tools.
  • Some object storage systems provide built-in migration tools, while third-party tools like Komprise Elastic Data Migration are also available.

Data Preprocessing:

  • Clean up unnecessary or obsolete data before migration.
  • Consider data compression or deduplication to reduce the amount of data to be migrated. (This is where analyzing and tiering cold data fits into a Smart Data Migration strategy.)

Metadata Mapping:

  • Ensure that metadata associated with objects is correctly mapped between the source and target systems.
  • Metadata might include information such as access permissions, creation date, and custom tags.

Network Considerations:

Testing:

  • Conduct a pilot migration with a subset of data to identify and address any issues.
  • Test the performance of the target object storage system with the migrated data.

Incremental Migration:

  • For large datasets, consider performing the migration incrementally to minimize downtime and impact on operations.

Monitoring and Validation:

  • Monitor the migration process to ensure it progresses smoothly.
  • Validate the integrity and completeness of the migrated data.

Update References:

Update any references or links to the objects in your applications or systems to point to the new object storage location.

Post-Migration Verification:

  • Verify that all data has been successfully migrated.
  • Confirm that applications and services dependent on the object storage are working as expected.

Documentation:

  • Update documentation to reflect the changes in the object storage environment.

Rollback Plan:

  • Have a rollback plan in case any issues arise during or after migration.

Smarter, Faster, Proven Object Data Migration

Learn more about NAS and object data migration with Komprise Elastic Migration. Whether migrating to the cloud, cloud NAS or to a NAS in your data center, with Komprise Elastic Data Migration you get the fast, predictable and cost-efficient data migration for file and object data.

komprise-elastic-data-migration-page-promo

Getting Started with Komprise:

Want To Learn More?

Object Lock

What is Object Lock?

Object Lock is the Amazon S3 object storage API implementation of immutable storage. Object Lock prevents objects from alteration or deletion for a set retention period. Object Lock is available in two modes:

  • Governance mode, which allows privileged administrators to override the Object Lock protection .
  • Compliance, the more strict mode, which cannot be overridden even by administrators for the length of data retention.  

Many of our customers use Komprise to archive cold data to Amazon S3 and want these files to be immutable for compliance and regulatory purposes. They may want protection against ransomware or malware incidents that can infect NAS shares. For both of these use cases, Komprise supports Amazon S3 buckets configured with S3 Object Lock, which allows customers to store objects using a Write-Once-Read-Many (WORM) model. Once Komprise archives data into such a bucket, the data cannot be overwritten or deleted, providing file retention that meets compliance regulations and protects data from being encrypted by malware or ransomware.

Ransonware_Blog_pic1

Learn more about Komprise for cyber resiliency, including optimizing your defenses against cyber incidents, system failure and file data.

Read the blog post: How to Protect File Data from Ransomware at 80% Lower Cost

Getting Started with Komprise:

Want To Learn More?

Object Storage

What is Object Storage?

Object storage, also known as object-based storage, object data storage or cloud storage, is a way of addressing and manipulating data storage as objects. Objects are kept inside a single repository and are not nested in a folder inside other folders. 

path-to-the-cloud-files-graphic

Each object has a distinct global identifier or key that is unique within its namespace. The access method for object is via URL, which allows object storage to abstract multiple regions, data centers and nodes, for essentially unlimited capacity behind a simple namespace. 

Objects, unlike file, have no hierarchy or directories but are stored in a flat namespace. Another key difference versus file is the user or application metadata is in the form of key value pairs. An example of object metadata is when you take a picture with your phone and store to the cloud it includes metadata such as “device=iphone.”

Object storage can achieve extreme levels of durability by creating multiple copies or implementing erasure coding for data protection. Object storage is also cost-efficient and is a good option for cheap, deep, scale-on-demand storage. While many object storage APIs exist, Amazon’s Simple Storage Service or S3 has become the de-facto standard supported by other public and private cloud storage vendors.

Object Storage Solutions

Popular object storage solutions include Amazon Simple Storage Service (S3), Google Cloud Storage, Microsoft Azure Blob Storage, IBM Cloud Object Storage.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

Orphaned Data

Orphaned data refers to data that is no longer associated with a corresponding record or entity in a database, data storage or other information system. This situation typically arises when a record, file, or object is deleted, but the associated data remains in the system without a proper link to a parent entity. Orphaned data can lead to various issues, including data inconsistency, inefficiency in storage usage, and potential challenges in data maintenance and retrieval.

OrphaneddatascreenshotOne of the most popular Komprise prebuilt reports, the Orphaned Data report shows metrics on data from ex-employees – sometimes referred to as “zombie data” or “unowned data.” Most organizations have no idea how much orphaned data they have nor how much it is costing them, which is both a cost liability and a potential compliance issue if the organization has policies on deleting ex-employee data. The Komprise Orphaned Data report shows the amount and cost of orphaned data and lists the top 10 shares with orphaned data by size. The report also recommends actionable steps to reduce these costs.

Watch the Reporting Best Practices Komprise customer success webinar.

What are some of the characteristics and considerations related to orphaned data?

  • Deletion of Parent Records: Orphaned data often occurs when a parent record or entity is deleted from a database, but the associated child data is not properly removed or updated.
  • Incomplete Data Relationships: Orphaned data indicates incomplete or broken relationships between data elements within a database or system.
  • Database Integrity Issues: Orphaned data can compromise database integrity, as it may violate referential integrity constraints that define relationships between tables.
  • Storage Inefficiency: Orphaned data occupies data storage space without contributing to the meaningful content or structure of the database or data storage device, leading to inefficient use of storage resources and high data storage costs.
  • Data Cleanup Challenges: Identifying and cleaning up orphaned data can be challenging, especially in large and complex databases. Automated tools and careful database maintenance practices are often necessary.
  • Impact on Data Quality: Orphaned data can contribute to data quality issues, as it may lead to inconsistencies and inaccuracies when querying or analyzing information.
  • Data Retrieval Difficulties: Retrieving relevant information from a database with orphaned data can be problematic, as the disconnected data (often trapped in data silos) may not be readily accessible or associated with the desired context.
  • Prevention and Cleanup Strategies: Database administrators often implement strategies to prevent orphaned data, such as using cascading delete operations or triggers to ensure that child records are appropriately handled when parent records are deleted. Data storage administrators are increasingly replying upon unstructured data management solutions to provide visibility and actionable insights. Regular data audits and cleanup processes are essential to identify and address orphaned data.
  • Application Development Considerations: When designing database schemas and developing applications, it’s crucial to implement robust data management practices to avoid orphaned data scenarios.

Addressing orphaned data challenges

Addressing orphaned data requires a combination of proactive prevention measures during system development and ongoing maintenance practices to identify and resolve any existing orphaned data issues. Database administrators play a key role in implementing and enforcing data integrity constraints and regularly auditing the database for potential orphaned data situations. Data storage administrators with hybrid, multi-cloud and multi-storage vendor environments are increasingly looking to data storage agnostic solutions like Komprise Intelligent Data Management to provide analytics driven unstructured data management and ongoing data lifecycle management solutions for cost savings and to harness greater value from their growing volumes of unstructured data.

Getting Started with Komprise:

Want To Learn More?

Policy-Based Data Management

Policy-based data management is data management based on metrics such as data growth rates, data locations and file types, which data users regularly access and which they do not, which data has protection or not, and more.

The trend to place strict policies on the preservation and dissemination of data has been escalating in recent years. This allows rules to be defined for each property required for preservation and dissemination that ensure compliance over time. For instance, to ensure accurate, reliable, and authentic data, a policy-based data management system should generate a list of rules to be enforced, define the data storage locations, storage procedures that generate data tiering and archival information packages, and manage replication.

Policy-based data management is becoming critical as the amount of unstructured data continues to grow while IT budgets remain flat. By automating movement of data to cheaper storage such as cloud data storage or private object storage, IT organizations can rein in data sprawl and cut costs.

Other things to consider are how to secure data from loss and degradation by assigning an owner to each file, defining access controls, verifying the number of replicas to ensure integrity of the data, as well as tracking the chain of custody. In addition, rules help to ensure compliance with legal obligations, ethical responsibilities, generating reports, tracking staff expertise, and tracking management approval and enforcement of the rules.

As data footprint grows, managing billions and billions of files manually becomes untenable. Using analytics-driven data management to define governing policies for when data should move, to where and having data management solutions that automate based on these policies becomes critical. Policy-based data management systems rely on consensus. Validation of these policies is typically done through automatic execution – these should be periodically evaluated to ensure continued integrity of your data.

Komprise-Deep-Analytics-Actions-Oct-2021-Blog-Social-768x402

Getting Started with Komprise:

Want To Learn More?

Pure Storage FlashBlade

Pure Storage FlashBlade® is a consolidated storage platform for unstructured data, be it file or object, that is built for
unlimited scale. FlashBlade//S™ is designed to deliver the efficiency, density, and top performance that modern unstructured data needs at scale. FlashBlade//E™ is designed to deliver the environmental, ease of use, and reliability benefits of all-flash storage for unstructured data workloads, but at a cost competitive with disk-based storage solutions.

FlashBlade was built with better engineering in mind. The chassis itself was built for long-life, or multiple generations of hardware, in mind while including maximum rack density along with power efficiency. It incorporated unified fast file and object storage with always-on encryption and data compression.

Komprise Intelligent Tiering for Pure Storage FlashBlade

pure_tiering_kompriseWhile performance-optimized FlashBlade//S and capacity-optimized FlashBlade//E can scale-out to petabytes of data, organization can be most cost-effective by keeping data at the most relevant FlashBlade based on where data is in its lifecycle.
Komprise, a SaaS for unstructured data management and mobility, was designed with the understanding that data is always in motion and should not be treated the same. Komprise gives enterprises the ability to intelligently manage their data by identifying rarely accessed data from FlashBlade//S and transparently tiering it to FlashBlade//E without any changes to the user or application access. With Komprise Transparent Move Technology™ (TMT), users and applications access data in the same location as before with Komprise use of Dynamic Links. The combination of Komprise Intelligent Data Management with the FlashBlade line of high-performance, resilient storage ensures the optimal cost/performance ROI in the industry.

Komprise Intelligent Data Management software complements the FlashBlade portfolio of products by providing transparent data tiering. Komprise software can intelligently identify data across FlashBlade//S and transparently move infrequently accessed data to more cost-efficient FlashBlade//E without disruptions to user or application access. Komprise also provides Pure Storage customers with analytics across data silos, powerful data migration and ongoing data lifecycle management.

Read this white paper to further understand the need for transparent data tiering, suggested architecture, solution validations and the benefits.

Learn more about Komprise for Pure Storage.

Komprise-Pure-Storage-Blog_Website-Featured-Image_1200px-x-600px

Getting Started with Komprise:

Want To Learn More?

Ransomware

What is ransomware?

Ransomware is a form of malware or cyber-attack perpetrated by criminal organizations to hold a victim’s data for ransom. The attack is typically launched via a trojan that once clicked traverses the users network encrypting file data to deny user access and disrupt business operations. With users and application locked out, the criminals demand payment in exchange for decrypting the victim’s data.

Ransomware Strategy for File Data Workloads

In an eWeek article, Komprise co-founder and CEO Kumar Goswami reviewed the ransomware challenge for unstructured file data: The File Data Factor in Ransomware Defense: 3 Best Practices. To create a cost-effective layered ransomware strategy, he recommended the following:

  1. Prioritize visibility and audits
  2. Create a multi-layered data management defense
    • Create Snapshots and Backups for Hot Data
    • Establish Cloud Tiering and Immutable Storage for Cold Data
  3. Have a plan – and validate it

Cost-Effective Ransomware Data Protection

Komprise provides cost-effective protection and recovery of file data. Komprise transparently tiers cold data and archives it from expensive storage and backups into a resilient object-locked destination such as Amazon S3 IA with Object Lock.

By putting the cold data in an object-locked storage and eliminating it from active storage and backups, you can create a logically isolated recovery copy while drastically cutting data storage costs and data backup costs. Komprise creates a logically-isolated copy of your file data with the following properties:

  • Physical Separation on Immutable Storage
  • File-level Isolation
  • Prevent Deletion
  • Instant access and recoverability in the cloud without expensive upfront investments

Read the blog post: How to Protect File Data from Ransomware at 80 Lower Cost

Learn more about Komprise for cyber resiliency, including optimizing your defenses against cyber incidents, system failure and file data.

What is Ransomware?

Ransomware is a type of malware that threatens to publish the victim’s personal data or perpetually block access to it unless a ransom is paid. According to Gartner, “Ransomware is one of the most common threats facing security and risk management leaders.” Most ransomware attacks target unstructured data on network shares, making centralized file data storage solutions a primary target.

How to protect File data from Ransomware

Data backup and disaster recovery (DR) solutions are where most enterprise IT organizations are investing in order to deliver better detection of and data protection against ransomware attacks. To protect file data from ransomware, the solution must:

  • Be cost-effective
  • Protect if backups and snapshots are infected
  • Provide simple recovery without significant upfront investment
  • Be verifiable

Read the blog post How to Protect File Data from Ransomware at 80 Lower Cost

Ransomware best practices

In the eWeek article The File Data Factor in Ransomware Defense: 3 Best Practices, Komprise CEO and co-founder summarizes the following ransomware best practices:

  1. Prioritize visibility and audits
  2. Create a multi-layered data management defense
  3. Have a plan – and validate it

Getting Started with Komprise:

Want To Learn More?

REST (Representational State Transfer)

REST (Representational State Transfer) is a software architectural style for distributed hypermedia systems, used in the development of Web services. Distributed file systems send and receive data via REST. Web services using REST are called RESTful APIs or REST APIs.

There are several benefits to using REST APIs: it is a uniform interface so you don’t have to know the inner workings of an application to use the interface, it’s operations are well defined and so data in different storage formats can be acted upon by the same REST APIs, and it is stateless, so each interaction does not interfere with the next. Because of these benefits, REST APIs are fast, easy to implement with, and easy to use. As a result, REST has gained wide adoption.

6 guiding principles for REST:

  1. Client–server – Separate user interface from data storage improves portability and scalability.
  2. Stateless – Each information request is wholly self contained so session state is kept entirely on the client.
  3. Cacheable – A client cache is given the right to reuse that response data for later, equivalent requests.
  4. Uniform interface – The overall REST system architecture is simplified and uniform due to the following constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.
  5. Layered system – The layered system is composed of hierarchical layers and
    each component cannot “see” beyond the immediate layer with which they are interacting.
  6. Code on demand (optional) – REST allows client functionality to be extended by downloading and executing code in the form of applets or scripts.

The REST architecture and lighter weight communications between producer and consumer make REST popular for use in cloud-based APIs such as those authored by Amazon, Microsoft, and Google. REST is often used in social media sites, mobile applications and automated business processes.

REST provides advantages over leveraging SOAP

REST is often preferred over SOAP (Simple Object Access Protocol) because REST uses less bandwidth, making it preferable for use over the Internet. SOAP also requires writing or using a server program and a client program.

RESTful Web services are easily leveraged using most tools, including those that are free or inexpensive. REST is also much easier to scale than SOAP services. Thus, REST is often chosen as the architecture for services available via the Internet, such as Facebook and most public cloud providers. Also, development time is usually reduced using REST over SOAP. The downside to REST is it has no direct support for generating a client from server-side-generated metadata whereas SOAP supports this with Web Service Description Language (WSDL).

Unstructured data management software using REST APIs

Open-APIs and a REST-based architecture are the keys to Komprise integrations. Using REST APIs gives customers the greatest amount of flexibility and here are some things customers can do with the Komprise Intelligent Data Management software via its REST API:

  • Get analysis results and reports on all their data
  • Run data migrations, data archiving and data replication operations
  • Search for data across all their storage by any metadata and tags
  • Build virtual data lakes to export to AI and Big Data applications

A REST API is a very powerful, lightweight and fast way to interact with data management software. Here is an example of the Komprise API in action: Automated Data Tagging with Komprise.

BLOG-Smart-Data-Workflows-Architecture-Overview-CFD14-THUMB-768x512

Getting Started with Komprise:

Want To Learn More?

S3

Amazon_Web_Services_LogoThe S3 protocol is used in a URL that specifies the location of an Amazon S3 (Simple Storage Service) bucket and a prefix to use for reading or writing files in the bucket. See S3 Intelligent Tiering.

Learn more about Komprise for AWS.

Komprise-Smart-Data-Migration-for-AWS-White-Paper-SOCIAL-768x402

Getting Started with Komprise:

Want To Learn More?

S3 Data Migration

S3 (Amazon Simple Storage Service) data migration entails transferring data stored in Amazon S3, a cloud-based object storage service offered by Amazon Web Services (AWS), to another system or S3 bucket within AWS.

S3 data migration involves several steps, such as data extraction, data transformation, data loading, data verification, and data archiving. S3 data migration can be complex and time-consuming, especially for organizations with large volumes of data and strict security and compliance requirements.

smart-file-data-migration-aws-thumbSmart Amazon S3 Data Migration and Data Management for File and Object Data

Komprise Elastic Data Migration is designed to make cloud data migrations simple, fast and reliable. It eliminates sunk costs with continual data visibility and optimization even after the migration. Komprise has received the AWS Migration and Modernization Competency Certification, verifying the solution’s technical strengths in file data migration.

A Smart Data Migration strategy for file workloads to Amazon S3 uses an analytics-driven approach to speed up data migrations and ensures the right data is delivered to the right tier in AWS, saving 70% or more on data storage and ultimately ensuring you can leverage advanced technologies in the cloud.

S3 Migration Done Right

When you’re migrating data or resources to Amazon Simple Storage Service (S3), a scalable object storage service offered by Amazon Web Services (AWS), your migrating to S3 can involve moving data from on-premises storage, another cloud provider, or even within different S3 buckets. Additionally, there are many possible S3 migration scenarios for unstructured data – for example, S3 to S3 migration, File to Object migration, Object to Object migration, etc. Whatever your object storage data migration strategy is, there are a number of basic steps considerations to keep in mind, including:

Assessment:

  • Identify the data you want to migrate.
  • Assess the size and type of data.
  • Consider access patterns and performance requirements.

Create an S3 Bucket:

  • Log in to the AWS Management Console.
  • Navigate to the S3 service.
  • Create a new bucket to store your data.

Set Up Permissions:

  • Configure access control lists (ACLs) and bucket policies to manage permissions.
  • Ensure that your IAM (Identity and Access Management) roles have the necessary permissions.

Data Transfer:

  • Many AWS customers start with AWS DataSync, AWS Snowball, AWS CLI, or SDKs to transfer data and then discover the analysis-first Komprise Elastic Data Migration solution.
  • For large-scale migrations, consider using AWS Snowball for physical transfer of data. Read the blog post here.

Update Applications:

  • If your data is being accessed by applications, update their configurations to point to the new S3 location.

Testing:

  • Perform tests to ensure data integrity and that applications can access data from the new S3 location.

Switch Over:

  • Once testing is successful, switch over to using the new S3 location.
  • Update DNS entries or configurations as needed.

Monitoring:

  • Set up monitoring and logging to track S3 usage and performance.
  • Implement alerts for any unexpected issues.

Clean-Up:

  • Once you are confident in the migration, clean up the old data storage and associated resources.

Documentation:

  • Update documentation to reflect the changes made during the migration.

Of course, the specifics of your S3 migration will vary depending on your use case, data volume, and existing infrastructure. It’s also important to consider security best practices and compliance requirements during the unstructured data migration process.

Learn more about Komprise for AWS.

Getting Started with Komprise:

Want To Learn More?

S3 Intelligent Tiering

S3 Intelligent Tiering is an Amazon cloud storage class. Amazon S3 offers a range of storage classes for different uses. S3 Intelligent Tiering is a storage class aimed at data with unknown or unpredictable data access patterns. It was introduced in 2018 by AWS as a solution for customers who want to optimize storage costs automatically when their data access patterns change.

Instead of utilizing the other Amazon S3 storage classes and moving data across them based on the needs of the data, Amazon S3 Intelligent Tiering is a distinct storage class that has embedded tiers within it and data can automatically move across the four access tiers when access patterns change.

To fully understand what S3 Intelligent Tiering offers it is important to have an overview of all the classes available through S3:

Classes of AWS S3 Storage

  1. Standard (S3) – Used for frequently accessed data (hot data)
  2. Standard-Infrequent Access (S3-IA) – Used for infrequently accessed, long-lived data that needs to be retained but is not being actively used
  3. One Zone Infrequent Access – Used for infrequently accessed data that’s long-lived but not critical enough to be covered by storage redundancies across multiple locations
  4. Intelligent Tiering – Used for data with changing access patterns or uncertain need of access
  5. Glacier – Used to archive infrequently accessed, long-lived data (cold data) Glacier has a latency of a few hours to retrieve
  6. Glacier Deep Archive – Used for data that is hardly ever or never accessed and for digital preservation purposes for regulatory compliance
Also be sure to read the blog post about Komprise data migration with AWS Snowball

Accelerating Petabyte-Scale Cloud Migrations with Komprise and AWS Snowball

AWS_Building_Logo-scaled

What is S3 Intelligent Tiering?

S3 Intelligent Tiering is a storage class that has multiple tiers embedded within it, each with its own access latencies and costs – it is an automated service that monitors your data access behavior and then moves your data on a per-object basis to the appropriate level of tier within the S3 Intelligent Tiering storage class. If your object has not been accessed for 30 consecutive days it will automatically move to the infrequent access tier within S3 Intelligent Tiering, and if the object is not accessed for 90 consecutive days it will automatically move the object to the Archive Access tier and then after 190 consecutive days to the Deep Archive access tier. If an object is moved to the archive tier, the retrieval can take 3 to 5 hours and if it is in the deep archive tier it can take 12 hours. and if it is then subsequently accessed it will move it into the frequently accessed storage class.

What are the costs of AWS S3 Intelligent Tiering?

You pay for monthly storage, request and data transfer. When using Intelligent-Tiering you also pay for a monthly per-object fee for monitoring and automation. While there is no retrieval fee in S3 Intelligent-Tiering and no fee for moving data between tiers, you do not manipulate each tier directly. S3 Intelligent Tier is a bucket, and it has tiers within it that objects move through. Objects in the Frequent Access tier are billed at the same rate as S3 Standard, objects stored in the Infrequent Access tier are billed at the same rate as S3 Standard Infrequent Access, objects stored in the Archive Access tier are billed at the same rate as S3 Glacier and objects stored in the Deep Archive access tier are billed at the same rate as S3 Deep Glacier.

What are the advantages of S3 Intelligent tiering?

The advantages of S3 Intelligent tiering are that savings can be made. There is no operational overhead, and there are no retrieval costs. Objects can be assigned a tier upon upload and then move between tiers based on access patterns. There is no impact on performance and it is designed for 99.999999999% durability and 99.9% availability over annual average.

What are the disadvantages of S3 Intelligent tiering?

The main disadvantage of S3 Intelligent Tiering is that it acts as a black-box – you move objects into it and cannot transparently access different tiers or set different versioning policies for the different tiers. You have to manipulate the whole of S3 Intelligent Tier as a single bucket. For example, if you want to transition an object that has versioning enabled, then you have to transition all the versions. Also, when objects move to the archive tiers, the latency of access is much higher than the access tiers. Not all applications may be able to deal with the high latency.

S3 Intelligent tiering is not suitable for companies with predictable data access behavior or companies that want to control data access, versioning, etc with transparency. Other disadvantages are that it is limited to objects, and cannot tier from files to objects, the minimum object storage requirement is 30 days, objects smaller than 128kb are never moved from the frequent access tier and lastly, because it is an automated system, you cannot configure different policies for different groups.

S3 Data Management with Komprise

Komprise is an AWS Advance Tier partner and can offer intelligent data management with visibility, transparency and cost savings on AWS file and object data. How is this done? Komprise enables analytics-driven intelligent cloud tiering across EFS, FSX, S3 and Glacier storage classes in AWS so you can maximize price performance across all your data on Amazon. The Komprise mission is to radically simplify data management through intelligent automation.

Komprise helps organizations get more value from their AWS storage investments while protecting data assets for future use through analysis and intelligent data migration and cloud data tiering.

AWS-Use-Case-Table-2

Learn more at Komprise for AWS.

What is S3 Intelligent Tiering?

S3 Intelligent Tiering is an Amazon cloud storage class that moves data to more cost-effective access tiers based on access frequency.

How AWS S3 intelligent tiering works

S3 Intelligent Tiering  is a storage class that has multiple tiers embedded within it. For a monitoring fee data is moved to optimize costs. Each tier with its own access latencies and costs:

  • Frequent – data accessed within 30 days
  • Infrequent – data accessed within 30-90 days
  • Archive Instant Access – data accessed greater than 90 days
  • Deep Archive Access – data not accessed for 180 days or greater (Optional*)

* Deep Archive Access: Also known as Glacier provides low cost with the tradeoff that data is not available for instant access. Retrieval time is within 12 hours and may cause time out condition for many applications. As such Deep Archive Access must be configured with the default configuration of S3 Intelligent Tiering

What are the advantages of S3 Intelligent tiering?

The advantages of S3 Intelligent tiering are that savings can be made for data where access pattern is unpredictable or unknown. There is no operational overhead, and there are no additional retrieval costs. Objects can be assigned a tier upon upload and then move between tiers based on access patterns.

What are the disadvantages of S3 Intelligent tiering?

The main disadvantage of S3 Intelligent Tiering is that it acts as a black-box – you move objects into it and cannot transparently access different tiers or set different versioning policies for the different tiers. For well-known workloads selecting the appropriate tier of storage can be more cost-effective vs S3 Intelligent Tiering.

Getting Started with Komprise:

Want To Learn More?

Secondary Storage

What is Secondary Storage?

Secondary storage devices are storage devices that operate alongside the computer’s primary storage, RAM, and cache memory. Secondary storage is for any amount of data, from a few megabytes to petabytes. These devices store almost all types of programs and applications. This can consist of items like the operating system, device drivers, applications, and user data. For example, internal secondary storage devices include the hard disk drive, the tape disk drive, and compact disk drive.

Some key facts about secondary storage:

Secondary Storage Data Tiering

Secondary storage typically tiers or archives inactive cold data and backs up primary storage through data replication or other data backup methods. This replication or data backup process, ensures there is a second copy of the data. In an enterprise environment, the storage of secondary data can be in the form of a network-attached storage (NAS) box, storage-area network (SAN), or tape. In addition, to lessen the demand on primary storage, object storage devices may also be used for secondary storage. The growth of organizational unstructured data has prompted storage managers to move data to lower tiers of storage, increasingly cloud data storage, to reduce the impact on primary storage systems. Furthermore, in moving data from more expensive primary storage to less expensive tiers of storage, knowns as cloud tiering, storage managers are able to save money. This keeps the data easily accessible in order to satisfy both business and compliance requirements.

path-to-the-cloud-files-graphic

When data tiering and archiving cold data to secondary storage, it is important that the archiving / tiering solution does not disrupt users by requiring them to rewrite applications to find the data on the secondary storage. Transparent archiving is key to ensuring that data moved to secondary storage still appears to reside on the primary storage and continues to be accessed from the primary storage without any changes to users or applications. Transparent move technology solutions that use file-level tiering to accomplish this.

Learn More: Why Komprise is the Easy, Fast, No Lock-In Path to the Cloud for file and object data.

What is Secondary Storage?

Secondary storage, sometimes called auxiliary storage, is non-volatile and is used to store data and programs for later retrieval. It is also known as a backup storage device, tier 2 storage, external memory, secondary memory or external storage. It is a non-volatile device that holds data until it is deleted or overwritten.

Secondary Storage Devices

Here are some examples of secondary storage devices:

  • Hard drive
  • Solid-state drive
  • USB thumb drive
  • SD card
  • CD
  • DVD
  • Floppy Diskette
  • Tape Drive
What is the difference between Primary and Secondary Storage?

Primary storage is the main memory where the operating system resides and is likely to be temporary, more expensive, smaller and faster and is used for data that needs to be frequently accessed.

Secondary storage can be hosted on premises, in an external device, or in the cloud. It is more likely to be permanent, cheaper, larger and slower and is typically used for long term storage for cold data.

Getting Started with Komprise:

Want To Learn More?

Sharding

Sharding, or storage sharding, is the technique of partitioning data in a data storage system into multiple subsets or “shards” to improve performance, scalability, and availability. In a storage system, sharding is used to distribute the workload of storing and retrieving data across multiple nodes or servers.

Benefits of Sharding

  • Improved performance: By distributing the workload across multiple nodes, storage sharding can improve the performance of the storage system. This is because each node is responsible for storing and retrieving a smaller subset of data, which can reduce the amount of data that needs to be processed in any given operation.
  • Improved scalability: Storage sharding can also improve the scalability of a storage system. As the amount of data being stored grows, more nodes can be added to the system to handle the increased workload. This allows the storage system to scale up to handle large amounts of data.
  • Improved availability: By storing data across multiple nodes, storage sharding can improve the availability of the storage system. If one node fails, the data can still be accessed from the other nodes in the system.

Sharding Challenges

  • Data consistency: As with any sharding technique, ensuring data consistency can be a challenge. When data is partitioned across multiple nodes, it can be difficult to ensure that all nodes have the same version of the data at all times.
  • Query complexity: Queries may need to be executed across multiple nodes, which can make querying more complex and impact query performance.
  • Shard rebalancing: When data is added or removed from the storage system, the shards may need to be rebalanced to maintain performance. This can be a complex and time-consuming process.

Overall, sharding can be a powerful technique for improving the performance, scalability, and availability of a storage system, but it requires careful planning and management to ensure its success.

Sharding Vendors

Some examples of vendors that use sharding include:

  • Amazon Web Services (AWS): AWS offers a service called Amazon S3 (Simple Storage Service), which is a highly scalable and durable object storage service that uses storage sharding to distribute data across multiple storage nodes.
  • Google Cloud Platform (GCP): GCP offers a similar service to Amazon S3 called Google Cloud Storage, which also uses storage sharding to distribute data across multiple nodes.
  • Microsoft Azure: Microsoft Azure offers a service called Azure Blob Storage, which is a highly scalable object storage service that uses storage sharding to distribute data across multiple nodes.
  • MongoDB: MongoDB is a popular NoSQL database that uses storage sharding to distribute data across multiple nodes in a cluster. This allows MongoDB to scale horizontally to handle large amounts of data.

Apache Cassandra is another NoSQL database that uses storage sharding to distribute data across multiple nodes in a cluster. Cassandra is designed to be highly scalable and can handle large amounts of data.

These are just a few examples of vendors that use storage sharding in their products. There are many other vendors that offer distributed storage systems that use storage sharding or similar techniques to improve performance, scalability, and availability.

Alternatives to Sharding

There are other techniques and approaches that can be used in distributed systems, depending on the specific needs of the system. For example, replication can be used to improve data availability and reduce the risk of data loss in the event of a node failure. Load balancing can be used to distribute workloads across multiple nodes, improving performance and reducing the risk of bottlenecks.

Other techniques that can be used in distributed systems include caching, data partitioning, and distributed locking. The choice of technique will depend on factors such as the specific use case, the size and complexity of the system, and the performance and availability requirements.

Ultimately, the key to achieving the best performance and availability in a distributed system is to carefully evaluate the needs of the system and select the appropriate techniques and approaches to meet those needs. There is no one-size-fits-all solution, and the choice of technique will depend on the specific requirements of the system in question.

The Difference Between Sharding and Chunking

Sharding and chunking are two different techniques used in different contexts.

  • Sharding is a technique used in distributed systems to divide data into smaller subsets, or “shards,” which are then distributed across multiple nodes in a network. Sharding is commonly used to improve scalability and availability in large-scale databases and storage systems.
  • Chunking is a technique used to break down larger pieces of information or data into smaller, more manageable chunks. Chunking is used in many different contexts, such as memory and learning, data storage and transmission, content creation, and user interface design.

While both sharding and chunking involve breaking down larger units into smaller pieces, they are used in different contexts and serve different purposes. Sharding is used in distributed systems to improve scalability and availability, while chunking is used to make information or data easier to process, remember, and communicate.

Getting Started with Komprise:

Want To Learn More?

Smart Data Workflows

What are Komprise Smart Data Workflows?

Smart Data Workflows, part of the Komprise Intelligent Data Management platform, is a systematic process to discover relevant file and object data across cloud, edge and on-premises datacenters and feed data in native format to AI and machine learning (ML) tools, data lakes and cloud file storage or cloud object storage. Smart Data Workflows solve common problems in unstructured data management: finding and moving the right unstructured data into data lakes, analytics platforms and cloud storage. Most of the work in finding and categorizing unstructured data to feed machine learning pipelines has been manual, delaying time to value and impeding the results of machine learning and AI projects.

Komprise-Smart-Data-Workflows-Diagram-9-1536x685

Users can create automated workflows for all the steps required to find the right data across your storage assets, tag and enrich the data, and send it to external tools for analysis. The Komprise Global File Index and Smart Data Workflows together reduce the time it takes to find, enrich and move the right unstructured data by up to 80%.

The components of Smart Data Workflows:

Search: Define and execute a custom query across on-prem, edge and cloud data silos to find the data you need.​

Execute & Enrich: Execute an external function on a subset of data and tag it with additional metadata. ​

Cull & Mobilize: Move only tagged data to the cloud.​

Manage Data Lifecycle: Move the data to a lower storage tier for cost savings once the analysis is complete.​

Watch the Smart Data Workflows chalk-talk.

Getting Started with Komprise:

Want To Learn More?

SMB Data Migration

SMB protocol data migration refers to the process of transferring data stored in SMB protocol-based systems, such as Windows file servers, to another system, such as a new file server or a cloud-based storage service. The SMB (Server Message Block) protocol is a network file sharing protocol used by Windows-based systems to access files and other resources on a server over a network.

Like any file data migration, an SMB data migration involves several steps, such as data extraction, data transformation, data loading, data verification, and data archiving. The goal of an SMB data migration is to ensure that all data accurately and securely transfers to the new system, while minimizing any disruptions to business operations and preserving the integrity of the data.

SMB protocol data migration can be challenging due to the complex nature of the SMB protocol and the large volumes of data that are often involved. To ensure a successful migration, organizations typically use specialized tools and services, such as data migration software, cloud data migration services, and managed data migration services.

The Barriers to Fast SMB Migrations

From the Hypertransfer white paper:

Unstructured data is everywhere. From genomics and medical imaging to streaming video, electric cars, and IoT products, all sectors generate unstructured file data. Data-heavy enterprises typically have petabytes of file data, which can consist of billions of files scattered across different storage vendors, architectures and locations. And while file data growth is exploding, IT budgets are not. That’s why enterprises’ IT organizations are looking to migrate file workloads to the cloud. However, they face many barriers, which can cause migrations to take weeks to months and require significant manual effort. These include:

  • Billions of files, mostly small: Unstructured data migrations often require moving billions of files, the vast majority of which are small files that have tremendous overhead, causing data transfers to be slow.
  • Chatty protocols: Server message block (SMB) protocol workloads—which can be user data, electronic design automation (EDA) and other multimedia files or corporate shares—are often a challenge since the protocol requires many back-and-forth handshakes which increase traffic over the network.
  • Large WAN latency: Network file protocols are extremely sensitive to high-latency network connections, which are essentially unavoidable in wide area network (WAN) migrations.
  • Limited network bandwidth: Bandwidth is often limited or not always available, causing data transfers to become slow, unreliable and difficult to manage.

Speed up SMB Migration with Hypertransfer

Blocks and Files coverage: Komprise speeds SMB data migration to cloud 

Hypertransfer for Komprise Elastic Data Migration delivers 25x performance gains compared to other tools. Komprise Elastic Data Migration is a SaaS solution available with the Komprise Intelligent Data Management platform or standalone. Designed to be fast, easy and reliable with elastic scale-out parallelism and an analytics-driven approach, it is the market leader in file and object data migrations, routinely migrating petabytes of data (SMB, NFS, Dual) for customers in many complex scenarios. Komprise Elastic Data Migration ensures data integrity is fully preserved by propagating access control and maintaining file-level data integrity checks such as SHA-1 and MD5 checks with audit logging.

———-

Getting Started with Komprise:

Want To Learn More?

Storage Area Network (SAN)

What is a Storage Area Network (SAN)?

A Storage Area Network (SAN) is a dedicated high-speed network (usually Fibre Channel) that provides block-level access to storage devices such as disk arrays and tape libraries. The goal of a SAN is to provide centralized data storage and data management that can be easily accessed by multiple servers. SANs can increase storage utilization, improve data security, and speed access times compared to using direct-attached storage (DAS).

Storage Area Networks (SANs) are still widely used in modern data centers. They provide centralized data storage and management of data, which allows for improved data availability, performance, and security compared to traditional direct-attached storage (DAS) solutions. SANs also typically provide advanced features such as storage virtualization, disaster recovery, and basic data tiering. In recent years, cloud computing adoption has led to greater use of network-attached storage (NAS) and object storage solutions, but SANs remain a popular choice for many organizations due to their performance, reliability, and compatibility with existing infrastructure.

Komprise Intelligent Data Management is a storage-agnostic solution that works across NAS technologies, from the data center to the cloud, to deliver visibility and mobility of unstructured data. Komprise helps customers with petabyte-scale data environments be more efficient in managing unstructured data and also proactively (and intelligently) moves file and object data to the right location at the right time for cost savings and value.

Getting Started with Komprise:

Want To Learn More?

Storage as a Service

Storage as a Service (STaaS) is a subscription service model for enterprise storage providers. Dell, HPE, NetApp, Pure Storage and others all offer SaaS subscriptions, which shifts IT spending from capital expenses (CAPEX) to operating expenses (OPEX), where you pay for what you need.

Storage as a Service can also be used to describe cloud-based storage solutions that allow users to store and access their data over the internet through a third-party service provider. STaaS providers typically offer scalable, on-demand storage capacity that can be easily provisioned and accessed via a web-based interface or an application programming interface (API).

The benefits of cloud-based STaaS include:

  • Scalability: STaaS providers typically offer scalable storage capacity that can be easily adjusted based on the user’s needs.
  • Cost-effectiveness: Users only pay for the storage capacity they need, without the need for upfront capital expenditures on storage infrastructure.
  • Flexibility: STaaS providers offer a range of storage options, including object storage, file storage, and block storage, allowing users to choose the most appropriate storage solution for their needs.
  • Data security: STaaS providers typically offer robust security features, including encryption, backup, and disaster recovery capabilities, to ensure data is protected against loss or unauthorized access.
  • Accessibility: STaaS providers allow users to access their data from anywhere, at any time, via an internet connection, making it easier to collaborate with colleagues and access data while on the go.

Examples of cloud native STaaS providers include Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage. STaaS can be particularly beneficial for organizations with large or rapidly growing storage needs or for users who need to store and access data from multiple locations or devices.

Read the white paper: Getting Departments to Care About Data Storage Cost Savings.

Getting Started with Komprise:

Want To Learn More?

Storage Assessment

A storage assessment is a process of evaluating an organization’s data storage infrastructure to gain insights into its performance, capacity, efficiency, and overall effectiveness. The goal of a storage assessment is typically to identify any bottlenecks, inefficiencies, or areas for improvement in the storage environment.

Whether delivered by a service provider or the storage vendor, traditional storage assessments have focused on:

  • Storage Performance: The assessment examines the performance of the storage infrastructure, including storage arrays, network connectivity, and storage protocols. It measures factors such as IOPS (Input/Output Operations Per Second), latency, throughput, and response times to identify any performance limitations or areas for optimization.
  • Capacity Planning: The assessment analyzes the current storage capacity utilization and predicts future storage requirements based on data growth trends and business needs. It helps identify potential capacity constraints and ensures adequate storage resources are available to meet future demands.
  • Storage Efficiency: The assessment evaluates the efficiency of storage utilization and identifies opportunities for optimization. This may include analyzing data deduplication, compression, thin provisioning, and other techniques to reduce storage footprint and improve storage efficiency.
  • Data Protection and Disaster Recovery: The assessment reviews the data protection and disaster recovery strategies in place, including backup and recovery processes, replication, snapshots, and data redundancy. It ensures that appropriate data protection measures are in place to minimize the risk of data loss and to achieve desired recovery objectives.
  • Storage Management and Monitoring: The assessment examines the storage management practices, including storage provisioning, data lifecycle management, storage tiering, and data classification. It assesses the effectiveness of storage management tools and processes and identifies areas for improvement.
  • Storage Security: The assessment assesses the security measures implemented within the storage infrastructure, including access controls, encryption, data privacy, and compliance with industry standards and regulations. It helps ensure the security of sensitive data stored in the infrastructure.
  • Cost Optimization: The assessment examines the data storage costs and identifies opportunities for cost optimization. This may include evaluating storage utilization, identifying unused or underutilized storage resources, and recommending strategies to optimize storage spending.

Based on the findings of the storage assessment, organizations can develop a roadmap for improving their storage infrastructure, addressing performance bottlenecks, enhancing data protection, optimizing storage efficiency, and aligning storage resources with business requirements. This helps ensure a robust and well-managed data storage environment that supports the organization’s data storage and unstructured management needs effectively.

Analyzing Data Silos Across Vendors: Hybrid Cloud Storage Assessments

Komprise Intelligent Data Management is an unstructured data management solution that helps organizations gain visibility, control, and cost optimization over their file and object data across on-premises and cloud storage environments. It offers a range of features and capabilities to simplify data management processes and improve storage efficiency. Komprise is used by customers and partners to deliver a data-centric, storage agnostic assessment of unstructured data growth and potential data storage cost savings. It helps organizations optimize storage resources, reduce costs, and improve data management efficiency based on real-time analysis of data usage patterns.

Common Komprise Use Cases

In addition to storage assessments, common use cases for Komprise include:

Komprise-Analysis-blog-SOCIAL-1-1Data Visibility and Analytics: Komprise Analysis provides comprehensive visibility into data usage, access patterns, and storage costs across heterogeneous storage systems. It offers detailed analytics and reporting, allowing organizations to understand their data landscape and make informed decisions.

Transparent File Archiving: Komprise identifies and archives infrequently accessed data to lower-cost storage tiers without disrupting user access thanks to patented Transparent Move Technology (TMT). It provides a transparent file system view, allowing users to access archived files seamlessly and retrieve them on-demand when needed. It identifies cold or inactive data and migrates it to more cost-effective storage, without disrupting user access or requiring changes to existing applications or file systems.

Cloud Data Management: Komprise extends its data management capabilities to cloud storage environments, including major cloud providers such as Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage. It enables organizations to manage data across hybrid and multi-cloud environments with consistent policies and visibility.

Data Migration: Komprise Elastic Data Migration is a SaaS solution available with the Komprise Intelligent Data Management platform or standalone. Designed to be fast, easy and reliable with elastic scale-out parallelism and an analytics-driven
approach, it is the market leader in file and object data migrations, routinely migrating petabytes of data (SMB,
NFS, Dual) for customers in many complex scenarios. Komprise Elastic Data Migration ensures data integrity is fully
preserved by propagating access control and maintaining file-level data integrity checks such as SHA-1 and MD5
checks with audit logging. As outlined in the white paper How To Accelerate NAS and Cloud Data Migrations, Komprise Elastic Data Migration is a highly parallelized, multi-processing, multi-threaded approach that improves performance at many levels. And with Hypertransfer, Komprise Elastic Data Migration is 27x faster than other migration tools.

Data Lifecycle Management: Komprise helps organizations automate the movement and placement of data based on data management policies. It enables the seamless transition of data between storage tiers, such as high-performance storage and lower-cost archival storage, to optimize performance and reduce storage costs.

Komprise Intelligent Data Management helps organizations optimize their storage infrastructure, reduce storage costs, improve data management efficiency, and gain better control and insights into their unstructured data. It simplifies complex data management processes and empowers organizations to make informed decisions about their data storage and utilization.

Getting Started with Komprise:

Want To Learn More?

Storage Costs

Storage costs are the price you pay for data storage. With the exponential growth and variety of cloud storage tiers to choose from, it is important to regularly evaluate your storage costs, which will vary depending on the storage solution, type and provider you choose. See Data Storage Costs.

In 2023 Komprise published an eBook: 8 Ways to Save on File Storage and Backup Costs.

  1. Consolidate storage and data management solutions.
  2. Adopt a data services mindset:
  3. Adopt new data management metrics.
  4. Introduce an analytics approach for departments and users:
  5. Become a cloud cost optimization expert.
  6. Develop best practices for data lifecycle management.
  7. Develop a ransomware strategy that also cuts costs.
  8. Don’t get locked in.

Rein-In-Storage

Factors that can impact storage costs:

  • Storage Type: Different storage types have varying costs. For example, solid-state drives (SSD) generally cost more than traditional hard disk drives (HDD) due to their higher performance and faster access times. Additionally, specialized storage options like archival storage or object storage may have different pricing structures based on the intended use cases.
  • Capacity: The amount of storage space you require directly impacts the cost. Providers typically charge based on the amount of data you store, usually measured in gigabytes (GB), terabytes (TB), or petabytes (PB). As you scale up your storage capacity, the costs will increase accordingly. See Capacity Planning.
  • Redundancy and Data Replication: If you require data redundancy or replication for increased data durability and availability, additional costs may be involved. Providers may charge for creating and maintaining multiple copies of your data across different locations or availability zones.
  • Data Access and Retrieval: The frequency and speed of data access can influence storage costs. Some storage services offer different retrieval tiers with varying costs, such as faster access options for immediate retrieval (which can be more expensive) or lower-cost options for infrequent access.
  • Data Transfer: Uploading and downloading data from storage solutions often incurs data transfer costs. These charges may apply when moving data into or out of the storage service or transferring data between regions or availability zones.
  • Service Level Agreements (SLAs): Certain storage solutions may come with service-level agreements that guarantee a certain level of performance, availability, or support. These enhanced SLAs may have higher associated costs.
  • Cloud Provider and Pricing Models: Different cloud providers have their own pricing structures, and costs can vary between them. It’s important to carefully compare the pricing details, including storage rates, data transfer costs, and any additional charges specific to each provider. Read: Cloud Storage Pricing in 2023: Everything You Need to Know.

To get accurate and up-to-date pricing information, it is recommended to visit the websites of cloud storage providers like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. They typically provide detailed pricing calculators and documentation that can help estimate the costs based on your specific storage requirements.

Getting Started with Komprise:

Want To Learn More?

Storage Efficiency

Storage efficiency is the optimization of data storage resources to ensure that data is stored in a manner that maximizes capacity utilization, reduces data storage costs, and maintains or improves performance. Efficient storage practices are crucial for enterprise IT organizations dealing with growing data volumes, the majority of which is unstructured data.

Why-File-Storage-is-Expensive-Blog_Website-Featured-Image_1200px-x-800px

What are some strategies to maximize data storage efficiency?

In addition to the right approach to unstructured data management, some of the common ways to ensure storage efficiency include:

Data Deduplication

Deduplication involves identifying and eliminating duplicate copies of data. By storing only one instance of duplicate data, organizations can save storage space and reduce redundancy. Watch the demo of the Komprise Potential Duplicates Report.

Compression

Compression techniques reduce the size of data by encoding it in a more compact form. Compressed data requires less storage space and can lead to more efficient storage utilization.

Block-tiering-vs-file-tiering-Oct-2019-FINALv21024_1
Download the white paper

Tiered Storage

Implementing tiered storage involves categorizing data based on access frequency and importance. Frequently accessed or critical data can be stored on high-performance, more expensive storage, while less critical or infrequently accessed data can be moved to lower-cost, slower storage tiers.

Thin Provisioning

Thin provisioning allows organizations to allocate storage space on an as-needed basis rather than allocating the full amount upfront. This helps prevent over-provisioning and ensures that storage resources are used efficiently.

Automated Storage Management

Implementing automated storage management tools and processes allows for dynamic adjustment of storage resources based on changing demands. Automation helps optimize storage allocations and reduces the need for manual intervention.

Snapshots and Backup Efficiency

Efficient storage systems use snapshot technologies to create point-in-time copies of data. This allows for quick and efficient backups, reducing the impact on primary storage and simplifying data recovery processes.

CloudTiering-UseCases-blogsocialArchiving and Data Lifecycle Management

Archiving infrequently accessed or older data to lower-cost storage solutions can free up space on primary storage. Implementing effective data lifecycle management ensures that data is stored on the most suitable storage tier throughout its lifecycle.

Storage Virtualization

Storage virtualization abstracts physical storage resources, allowing for centralized management and optimization of storage across heterogeneous environments. It simplifies storage administration and enables more efficient resource utilization.

Cloud Storage Integration

Integrating cloud storage into the storage infrastructure allows organizations to leverage scalable and cost-effective cloud resources for storing data. Cloud storage can be used for archival, backup, and offloading infrequently accessed data.

PttC_pagebanner-2048x639

Efficient File Systems

Choosing file systems optimized for storage efficiency can make a significant difference. Some file systems are designed to handle large amounts of data efficiently, with features such as fast indexing and snapshot capabilities.

New-Reports-on-Unstructured-Blog_Website-Featured-Image_1200px-x-600pxMonitoring and Analytics

Implementing monitoring tools and analytics helps organizations understand storage usage patterns, identify potential bottlenecks, and make informed decisions about optimizing storage configurations.

Read the blog post: File Data Metrics to Live By

Regular Maintenance and Cleanup

Periodic reviews and cleanup of obsolete or redundant data, as well as reclaiming unused storage, contribute to maintaining an efficient storage environment.

Efficient storage management is an ongoing process that requires a combination of technologies, best practices, and strategic decision-making. By adopting these strategies, organizations can optimize storage resources, reduce data storage costs, and ensure that their storage infrastructure aligns with business objectives.

Getting Started with Komprise:

Want To Learn More?

Storage Insights

Komprise announced Storage Insights to unify storage management and unstructured data management with the release of Komprise Intelligent Data Management 5.0.

What is Storage Insights?

Storage Insights is a console that is included in all editions of Komprise including Komprise Analysis and Komprise Elastic Data Management. Komprise has always provided visibility across heterogeneous storage on data, including how data is being used, how fast it’s growing, who is using it, what data is hot and cold, and where it lives. Storage Insights delivers a Data Stores console that adds storage metrics to these data metrics to further simplify file and object data management.

  • See all your storage organized by data center locations and clouds and view the available capacity, data size, data growth, growth percentage, amount of cold data, department the share belongs to, vendor-specific metrics and more.
  • Storage Insights is a management console to quickly understand both data usage and storage consumption and where you add new data stores for analysis and data management activities.
StorageInsightsDashboard-2048x903
A single view for unified data and storage management.
StorageInsightsColumns
Customize your view and track and manage what matters most.

Why Storage Insights?

Increasingly distributed and multi-storage enterprise IT organizations are frustrated that each storage vendor tends to report free space or storage consumption differently and they must search across different places for this information. By adding storage metrics and a customizable interface to Komprise Intelligent Data Management, customers don’t have to look in multiple places.  They get one consistent definition and view of their storage metrics and data metrics.

Komprise unifies both data and storage insights in a single console with Storage Insights. Customers can easily spot trends like which shares have the most anomalous activity or which shares are filling up the fastest or which shares are owned by a particular department that have a lot of cold data.

Learn more about Komprise Intelligent Data Management 5.0.

Getting Started with Komprise:

Want To Learn More?

Storage Metrics

Storage metrics (or data storage metrics) are measurements and indicators used to assess various aspects of a storage system’s performance, capacity, and efficiency. These metrics provide valuable insights into how data storage resources are utilized, helping organizations optimize their storage infrastructure, plan for future needs, and troubleshoot issues.

Read the blog post: File Data Metrics to Live By.

Common data storage metrics

Capacity Utilization:
  • Used Capacity: The amount of storage space currently in use.
  • Free Capacity: The remaining storage space available for use.
Throughput:
  • IOPS (Input/Output Operations Per Second): The number of read and write operations that a storage system can perform in one second.
  • Throughput: The amount of data (in bytes) transferred per unit of time.
Latency:
  • Read Latency: The time it takes for a storage system to respond to a read request.
  • Write Latency: The time it takes for a storage system to acknowledge the completion of a write operation.
Availability:
  • Uptime/Downtime: The percentage of time the storage system is available versus the time it is unavailable.
    Reliability:
  • Error Rate: The frequency of errors or data corruption within the storage system.
Data Protection:
  • Backup Success/Failure: The success or failure rate of backup operations.
  • Snapshot Usage: The utilization of snapshot technology for data protection.
Storage Efficiency:
  • Deduplication Ratio: The ratio of data reduction achieved through deduplication.
  • Compression Ratio: The ratio of data reduction achieved through compression.
Data Lifecycle Management:
  • Data Age: The age of data in the storage system, helping in the management of data lifecycle.
Resource Utilization:
  • CPU and Memory Usage: The utilization of CPU and memory resources on storage devices.
Network Performance:
  • Bandwidth: The amount of data that can be transmitted over the network in a given time.
Queue Length:
  • Storage Queue Length: The number of I/O operations waiting to be processed by the storage system.

Monitoring these storage metrics provides data storage administrators and IT teams with the information needed to make informed decisions about storage provisioning, performance optimization, and overall system health. Many storage management tools and platforms offer dashboards and reports that display these metrics for easy analysis and troubleshooting.

Unified Data and Storage Insights

Storage-Insights-PR_Website-Featured-Image_1200px-x-600pxKomprise Storage Insights gives administrators the ability to drill down into file shares and object stores across locations and sites, including relevant metrics by department, division or business unit, such as:

  • Which shares have the greatest amount of cold data?
  • Which shares have the highest recent growth in new data?
  • Which shares have the highest recent growth overall?
  • Which file servers have the least free space available?
  • Which shares have tiered the most data?

One Komprise customer put it this way:

“It’s a single interface that will show us important metrics like capacity usage in every storage location, which will save us a lot of time and ensure we make the right decisions for our departments and users.”

FILE-DATA-METRICS-TO-LIVE-BY-Blog_-Linkedin-Social-1200px-x-628px

Getting Started with Komprise:

Want To Learn More?

Storage Tiering

What is Storage Tiering?

Storage Tiering refers to a technique of moving less frequently used data, also known as cold data, from higher performance storage such as SSD to cheaper levels of storage or tiers such as cloud or spinning disk. The term “storage tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems. Storage tiering is now considered a core feature of modern storage systems and recently has become part of default configuration for next generation storage like AWS FSx ONTAP.

Block-level data storage solutions include: NetApp FabricPool and Dell PowerScale CloudPools.

Storage-agnostic data management and data tiering have emerged as more and more enterprise organizations adopt hybrid, multi-cloud, and edge IT infrastructure strategies. See also cloud tiering and choices for cloud data tiering.

komprise-file-tiering-image-768x404

 

Storage Tiering Cuts Costs Because 70%+ of Data is Cold

As data grows, data storage costs grow. It is easy to think the solution is more efficient storage. Or simply buy more storage. But data management is the real solation. Typically over 70% of data is cold and has not been accessed in months, yet it sits on expensive storage hardware or cloud infrastructure and consumes the same backup resources as hot data. As a result, data storage costs are rising, backup times are slowing, disaster recovery (DR) is unreliable, and the sheer bulk of this data makes it difficult to leverage newer options like Flash and Cloud.

Data Tiering Was Initially Used within a Storage Array

Data Tiering was initially a technique used by storage systems to reduce the cost of data storage by tiering cold data within the storage array to cheaper but less performant options – for example, moving data that has not been touched in a year or more from an expensive Flash tier to a low-cost SATA disk tier.

Typical storage tiers within a storage array or on-premises storage device include:

  • Flash or SSD: A high-performance storage class but also very expensive. Flash is usually used on smaller data sets that are being actively used and require the highest performance.
  • SATA Disks: High-capacity disks with lower performance that offer better price per GB vs SSD.
  • Secondary Storage, often Object Storage: Usually a good choice for capacity storage – to store large volumes of cool data that is not as frequently accessed, at a much lower cost.

Increasingly, enterprise IT organization are looking at another option – tiering or archiving data to a public cloud.

  • Public Cloud Storage: Public clouds currently have a mix of object and file storage options. The object storage classes such as Amazon S3 and Azure Blob (Azure Storage) provide tremendous cost efficiency and all the benefits of object storage without the headaches of setup and management.
  • Cloud NAS has also become increasingly popular, but if unstructured data is not well managed, data storage costs will be prohibitive.

Cold-Data-TieringCloud Storage Tiering is now Popular

Tiering and archiving less frequently used data or cold data to public cloud storage classes is now more popular. This is because customers can leverage the lower cost storage classes within the cloud to keep the cold data and promote them to the higher cost storage classes when needed. For example, data can be archived or tiered from on-premises NAS to Amazon S3 Infrequent Access or Amazon Glacier for low ongoing costs, and then promoted to Amazon EFS or FSX when you want to operate on it and need performance.

Cloud isn’t just low-cost data storage 

The cloud offers more than low-cost data storage. Advanced security features such immutable storage that can defeat ransomware. Cloud native services from analytics to machine learning can drive value from your unstructured data.

But in order to take advantage of these capabilities, and to ensure you’re not treating the cloud as just a cheap storage locker, data that is tiered to the cloud needs to be accessible natively in the cloud without requiring third-party software. This requires the right approach to storage tiering, which is file-tiering, not block-tiering.

Komprise_ArchivingTiering_blogthumb-768x512

Block Tiering Creates Unnecessary Costs and Lock-In

Block-level storage tiering was first introduced as a technique within a storage array to make the storage box more efficient by leveraging a mix of technologies such as more expensive SSD disks as well as cheaper SATA disks.

Block storage tiering breaks a file into various blocks – metadata blocks that contain information about the file, and data blocks that are chunks of the original file. Block-tiering or Block-level tiering moves less used cold blocks to lower, less expensive tiers, while hot blocks and metadata are typically retained in the higher, faster, and more expensive storage tiers.

Block tiering is a technique used within the storage operating system or filesystem and is proprietary. Storage vendors offer block tiering as a way to reduce the cost of their storage environment. Many storage vendors are now expanding block tiering to move data to the public cloud or on-premises object storage.

But, since block storage tiering (often called CloudPools – examples are NetApp FabricPool and Dell EMC Isilon CloudPools) is done inside the storage operating system as a proprietary solution, it has several limitations when it comes to efficiency of reuse and efficiency of storage savings. Firstly, with block tiering, the proprietary storage filesystem must be involved in all data access since it retains the metadata and has the “map” to putting the file together from the various blocks. This also means that the cold blocks that are moved to a lower tier or the cloud cannot be directly accessed from the new location without involving the proprietary filesystem because the cloud does not have the metadata map and the other data blocks and the file context and attributes to put the file together. So, block tiering is a proprietary approach that often results in unnecessary rehydration of the data and treats the cloud as a cheap storage locker rather than as a powerful way to use data when needed.

With block storage tiering, the only way to access data in the cloud is to run the proprietary storage file system in the cloud which adds to costs. Also, many third-party applications such as backup software that operate at a file level require the cold blocks to be brought back or rehydrated, which defeats the purpose of tiering to a lower cost storage and erodes the potential savings.

For more details, read the white paper: Block vs. File-Level Tiering and Archiving.

PttC_pagebanner-2048x639

Getting Started with Komprise:

Want To Learn More?

Stubs

What are Stubs?

Stubs are placeholders of the original data after it has been migrated to the secondary storage. Stubs replace the archived files in the location selected by the user during the archive. Because stubs are proprietary and static, if the stub file is corrupted or deleted, the moved data gets orphaned. Komprise does not use stubs, which eliminates this risk of disruption to users, applications, or data protection workflows.

Komprise-Kumar-TMT-Deep-Dive-Blog-Part2-ThumbChallenges with Stubs

Stubs are brittle. When stubbed data is moved from its storage (file, object, cloud, or tape) to another location, the stubs can break. The storage management system no longer knows where the data has been moved to and it becomes orphaned, preventing data access. Most storage management solutions on the market use client-server architecture and do not scale to support data at massive scale.

Proprietary interface like stubs can be used to make tiered data appear to reside on primary storage, but the transparency ends there. To access data, the storage management system intercepts access requests, retrieves the data from where it resides, and then rehydrates it back to primary storage. This process adds latency and increases the risk of data loss and corruption.

Standards-Based Transparent Data Tiering

A true transparent data tiering solution creates no disruption, and that’s only achievable with a standards-based approach. Komprise Intelligent Data Management is the only standards-based transparent data tiering solution that uses Transparent Move Technology™ (TMT), which uses Dynamic Links that are based on industry-standard symbolic links instead of proprietary stubs.

Komprise-Transparent-Move-Technology-White-Paper-SOCIAL-768x402

Learn more about the differences between stubs, symbolic links and Dynamic Links from Komprise.

Read the Komprise Architecture Overview white paper to learn more.

Getting Started with Komprise:

Want To Learn More?

Symbolic Link

NAS_Diagram.png

What is a Symbolic Link? What is a symlink?

Symbolic Links, also known as symlinks and symbolic linking, are file-system objects that point toward another file or folder. These links act as shortcuts with advanced properties that allow access to files from locations other than their original place in the folder hierarchy by providing operating systems with instructions on where the “target” file can be found.
For the operating system, the symlink is transparent for many operations and functions in the same manner as the target file or folder would even though it’s only a link that points to the original. For example, if a program needs to be in folder A to run, but you want to store it in folder B instead, the entire A folder could be moved into the B folder with a symbolic link created in folder A which points to folder B. When the program is launched, the operating system would refer to folder A, find the symbolic link to folder B, and run the program from folder B as if it was still in its original place in folder A.
This method is widely used in the storage industry in programs such as OneDrive, Google Drive, and Dropbox to sync files and folders across different platforms of storage or in the cloud.
These types of links began to appear in operating systems in the late 70’s such as RDOS. In modern computing, symbolic links are present in most Unix-like operating systems which are supported by the POSIX standard such as Linux, macOS, and Tru64. This feature was also added to Microsoft Windows starting with Windows Vista.

Symbolic Links vs Hard Links

Both types of symbolic links (also known as symbolic linking) allow seamless and mostly transparent targeting of a file, but they do so in different ways.

Soft links, also referred to as symbolic links by Microsoft, work similarly to a normal shortcut in the sense that they point directly to file or folder itself. These types of links also use less memory overall.
On the other hand, hard links point to the storage space designated to hold the contents of the file or folder.
In this sense, if the location or the name of the file changes, then a soft link would no longer work since it was pointing to the original file itself, but with a hard link, any changes made to the original file or the hard link contents are mirrored by the other because both are pointing to the same location on the storage.
Hard links act as a secondary entrance to the same file or folder which they are linked to, but they can only be used to connect two entities within the same file system, whereas soft links can bridge the gap between different storage devices and file systems.

Hard symbolic links also have more restrictive requirements than soft links:
  • Hard links may not be able to link to directories.
  • The target file or folder for a hard link must exist.
  • Hard links cannot point to targets that are located on different partitions, volumes, or file systems.

Junctions

A Junction is a lesser-used, third type of symbolic link that combines aspects from both hard and soft links. The target file must exist for the junction to be created, but if the target file or folder is erased afterward, the link will still be there but will no longer be functional.

How are Soft and Hard Symbolic Links Commonly Used?

Hard links are used to create “backups” on filesystems without using any additional storage space. This is a benefit as it is often easier to manage a single directory with multiple references pointing to it rather than managing multiple instances of the same directory. If the file or folder is no longer accessible from its original location, then the hard link can be used as a backup to regain access to those files.
The Time Machine feature on macOS uses hard symbolic links to create images to be used for backup.
Soft links are used more heavily to enable access for files and folders on different devices or filesystems. These types of symbolic links are also used in situations where multiple names are being used to link to the same location.

Types of Businesses that Make Use of Symbolic Links

Symbolic links are leveraged in nearly every industry that uses computers, but some industries make use of these links more than others. Below are industries where symbolic links are most commonly used:

Creating Symbolic Links

The process used to create symbolic links is different on each type of operating system. Below are brief instructions on how a soft or hard link can be set up in Linux and Windows.

How to Create a Soft Link in Linux

To create a soft symbolic link in Linux, the ln command-line utility can be used as such:
ln -s [OPTIONS] FILE LINK
The FILE argument represents the origin of the link. The LINK argument represents the target destination for the soft link.
When the command is successful, there is no output and the command-line will return zero.

How to Create a Hard Link in Linux

For creating hard links in Linux, a similar version of the ln command is used but without the -s:
ln [OPTIONS] FILE LINK
The FILE argument is still the origin location and the LINK argument is still the destination file or directory.

Creating a Windows Soft Link

The mklink command can be used to create soft links in Windows Vista & later through a command prompt or powershell with elevated permissions. By default, this command with no options will produce a soft link.
mklink command:
mklink Link Target

The Link argument is the origin file/directory location and the Target argument represents the intended destination file.
For creating a soft link pointing to a directory, this command is used instead:
mklink /D Link Target

Creating a Windows Hard Link

Similarly to creating a soft link in Windows, the mklink can also be used to create hard links when /H is included as an option as such:
mklink /H Link Target
For creating a junction, the /J option is used instead of /H:
mklink /J Link Target

Komprise Transparent Move Technology (TMT) and Symlinks

The patented Komprise Transparent Move Technology™ (TMT) goes beyond storage-based data tiering to analyze, migrate, tier and replicate data across multi-vendor storage and clouds while enabling native use of the data at each layer. This storage-agnostic data management is possible without disrupting users and without locking data in a proprietary format one vendor’s storage silo.

Komprise TMT uses the standard, built-in feature of Windows, Linux, and Mac symbolic links, which replace a file with a tiny pointer to another location. By using Dynamic Links inside the standard symbolic link, Komprise extends the file system to call these files from the cloud or other storage systems. Dynamic Links dynamically bind a request to the actual data so it can move a file from NFS or SMB to a native cloud object and still provide transparent access from the source.

Read the white paper: Leveraging the Full Power of the Cloud with Komprise Transparent Move Technology.

What is a symbolic link?

A symbolic link, also known as a symlink or soft link, is a file that serves as a reference or pointer to another file or directory in a file system. Unlike a hard link, which points directly to the index node (aka inode – a data structure on a file system on Unix-like operating systems that stores information about a file or a directory), a symbolic link contains a reference to the file’s pathname.

What is a Dynamic Link?

Komprise uses a patented mechanism called Dynamic Links that use standard protocol constructs, eliminate proprietary agents and do not get in the hot data path. Komprise Transparent Move Technology (TMT) uses the standard, built-in feature of Windows, Linux, and Mac called symbolic links which replace a file with a tiny pointer to another location. By using the Komprise Dynamic Link inside the standard symbolic link, the file system is extended to call these files from the cloud or other storage systems. Dynamic Links dynamically bind a request to the actual data so it can move a file from NFS or SMB to a native cloud object and still provide transparent access from the source. As a user if you want to see how this works, click a file that Komprise tiered to the cloud and you get it back instantly. Simply right click the file and you will see that the path points to the Komprise Dynamic Link instead of a file on your computer. Komprise TMT is a scalable, storage-agnostic data movement solution that maintains file and object duality to give you the best of both worlds – transparent data access from the source and native data access outside the data path. In this blog post, Komprise cofounder and CEO noted:

With our patented Dynamic Links, Komprise stays outside the hot data path to deliver a standards-based open data management solution that is resilient and avoids the pitfalls of static stubs or symlinks.

How is symbolic linking different than stubs?

Symbolic linking is a file system feature that allows the creation of references to files or directories, providing a form of aliasing. On the other hand, stubs are placeholders or temporary implementations used in software development, often related to the linking and compilation process, and are replaced with the actual code or functionality at a later stage. The key difference lies in their purposes and the context in which they are used. Symbolic linking and stubs serve different purposes and operate at different levels within a system.

Symbolic Linking:

Symbolic linking, also known as symlink or soft link, is a mechanism in file systems that allows the creation of a special type of file that serves as a symbolic reference or pointer to another file or directory.

  • Purpose: Symbolic links, or symlinks, provide a way to create references or pointers to files or directories. They are used to create aliases or shortcuts to other files or directories within the file system.
  • Mechanism: A symbolic link is a separate file that contains a path reference to the target file or directory. When the symlink is accessed, the system follows the path reference to the target location.
  • Independence: The symlink and the target file or directory have different inodes (index nodes), and they can be located on different file systems. Deleting the symlink does not affect the target, but deleting the target may leave a “dangling” symlink.
  • Example: If you have a file named file.txt and create a symlink named link.txt pointing to it, accessing link.txt will effectively access the contents of file.txt.

Note that Komprise uses Dynamic Links. Learn more here.

Stubs:

In software development, a stub is a piece of code or a placeholder that stands in for a more complete or complex implementation. Stubs are used in various stages of the software development life cycle and serve different purposes, primarily related to testing, development, and dependency management.

  • Purpose: Stubs are pieces of code or placeholders used in software development and deployment. They are typically temporary implementations or references that are later replaced with the actual code or functionality.
  • Mechanism: Stubs can be placeholders or simplified versions of functions or modules. During development, they may serve as stand-ins for more complex or complete implementations until those are available.
  • Dependency Resolution: Stubs are often used in the context of linking and compilation. They allow code to be compiled and linked even when some dependencies are not fully implemented.
  • Example: In software development, a stub may be used to simulate the behavior of a network interface or a hardware device before the actual device or interface is available. Once the real device is in place, the stub is replaced with the complete implementation.

Traditionally, unstructured data management solutions that move data have relied on one of two approaches:

  1. They either move the data entirely out of the primary storage, which is undesirable in most scenarios because it creates user friction as users think their files have disappeared.
  2. If they try to move data transparently, they leave behind a proprietary “stub” file that points to the moved file.

Stubs are problematic for two reasons:

  1. stub is proprietary and you need an agent installed on the file storage to detect when the stub is opened. This puts the data movement tool in the hot data path, which is undesirable, as it impacts performance and can become a bottleneck.
  2. Because a stub statically points to the new location of the file, if a stub is accidentally deleted, then data can get orphaned. Also, a stub can only go to another similar file system, so you cannot bridge file and object for example.

The bottom line is stubs are hard to manage. They need to be backed up, they’re in the data path, they’re limiting, and they are risky because they are a single point of failure. Komprise eliminates these issues by using a patented mechanism called Dynamic Links that uses standard protocol constructs and eliminate proprietary agents without getting in the hot data path.

Watch the TechTalk with Komprise cofounder and CTO.

Getting Started with Komprise:

Want To Learn More?

Tiering

What is Tiering?

Komprise_ArchivingTiering_blogthumb-768x512

In the context of data storage, tiering refers to the practice of organizing data into different tiers based on its value or frequency of access. Each tier is assigned a different level of performance, cost, and capacity, with the goal of optimizing the use of storage resources and reducing costs.

The most commonly used tiers are:

  • Tier 1: This is the highest-performing and most expensive tier, typically using solid-state drives (SSDs) for fast access to critical data that is frequently accessed or requires low latency.
  • Tier 2: This tier is less expensive than Tier 1 and is typically made up of hard disk drives (HDDs) or slower SSDs. It is used for data that is still frequently accessed but not as critical as Tier 1 data.
  • Tier 3: This is a low-cost and high-capacity tier, typically using slower HDDs or object storage. It is used for infrequently accessed data or data that is older and less valuable.

Komprise-Analysis-blog-SOCIAL-1-1-768x402

Unstructured data is typically moved automatically between tiers based on predefined data management policies that consider factors such as data age, access frequency, and cost. This ensures that frequently accessed data is stored in the higher-performing and more expensive tiers, while infrequently accessed data is stored in the lower-cost tiers. The goal of tiering is to optimize storage utilization and reduce costs while ensuring that data is accessible when needed.

Storage Tiering, Data Archiving, and Transparent Archiving – What’s the Difference?

File Migration Isn’t File Archiving

Cloud Tiering: Storage-based vs. Gateway vs. File Based

Getting Started with Komprise:

Want To Learn More?

Transparent Move Technology

Transparent Move Technology refers to an approach for Data Tiering, Data Archiving and Data Management that moves cold files transparently such that:
  1. The archived files can still be viewed and opened from the original location so users and applications do not need to change their data access.
  2. The archived files can be accessed via the original file protocols even if they are archived on an object repository.
  3. There is no change to the data path for the hot data that is not archived.  So there are no server or client side agents or static stubs.
  4. Accessing archived files does not cause the data to be brought back or rehydrated.  The approach is transparent to backup software and other applications.

Read the white paper: Transparent Move Technology – Leverage the Full Power of the Cloud without Disrupting Users and Applications with Komprise TMT

Getting Started with Komprise:

Want To Learn More?

Unstructured Data Classification

Unstructured data classification involves the process of categorizing and organizing unstructured data based on its content, context, or other characteristics. Unstructured data typically refers to information that does not have a predefined data model or is not organized in a structured manner, such as text documents, images, audio files and videos. Classifying unstructured data is increasingly recognized as essential for efficient unstructured data management, search, and analysis.

Komprise-Deep-Analytics-Actions-Oct-2021-Blog-SocialKomprise Deep Analytics allows you to find the right data that fits specific criteria across all your data storage silos to answer questions, such what file types the top data owners are storing. Once you connect Komprise to your file and object storage, Komprise indexes the data and creates a Global File Index of all your data. You do not have to move the data anywhere; but you now have a single way to query and search across all file and object stores. For instance, say you have some NetApp, some Isilon, some Windows servers, some Pure Storage at different sites and you have some cloud file storage on Amazon, Azure, and Google. You get a single index via Komprise of all the data across all these environments you can search and find exactly the data you need across all these environments with a single console and API. Once you find the data you want to operate on, you can systematically move it using Komprise Intelligent Data Management. For example, if you want to tier files generated by certain instruments to the cloud, you can create a policy so that as new files are generated, they are continuously and automatically moved. This makes it easy to systematically leverage analytics to move and operate on unstructured data.

Unstructured Data Classification: A Top Enterprise Data Storage Trend

According to Gartner’s Top Trends in Enterprise Data Storage 2023 (subscription required):

By 2027, at least 40% of organizations will deploy data storage management solutions for classification, insights and optimization, up from 15% in early 2023.

The report goes on to note that:

Data classification or categorization helps improve IT and business outcomes such as storage optimization, data life cycle enforcement, security risk reduction and faster data workflows. Data classification and insights solutions are typically vendor storage agnostic, and work on any data that can be accessed over a file or object access protocols like NFS, SMB or S3.

What are some approaches and techniques for unstructured data classification?

Text-Based Classification

  • Natural Language Processing (NLP): NLP techniques, including text tokenization, sentiment analysis, and named entity recognition, can be used to analyze the content of textual data.
  • Keyword Matching: Classifying documents based on the presence of specific keywords or key phrases related to predefined categories.

Image-Based Classification

  • Computer Vision: Utilizing computer vision techniques, such as image recognition and object detection, to classify and categorize images based on their visual content.
  • Feature Extraction: Extracting features from images, such as color histograms or texture patterns, and using machine learning models for classification.

Audio and Speech-Based Classification

  • Speech Recognition: Converting spoken language into text for further analysis and classification.
  • Audio Analysis: Extracting features from audio files, such as pitch or frequency, and using machine learning algorithms for classification.

Metadata-Based Classification

  • File Metadata: Utilizing metadata associated with files, such as creation date, author, or file type, for classification purposes.
  • Exif Data: For images, extracting metadata embedded in the file, such as camera settings and location information. Exchangeable image file format (EXIF).

Pattern Recognition

  • Machine Learning Algorithms: Training machine learning models, including supervised or unsupervised learning algorithms, to recognize patterns and classify unstructured data based on historical examples.
  • Clustering: Grouping similar data points together using clustering algorithms to discover natural groupings within unstructured data.

Rule-Based Classification

  • Predefined Rules: Establishing rules and criteria for classifying data based on certain characteristics or conditions.
  • Expert Systems: Using expert systems that encode human expertise and rules for classification.

Content Analysis

  • Topic Modeling: Identifying topics or themes within unstructured text data using techniques like Latent Dirichlet Allocation (LDA).
  • Sentiment Analysis: Determining the sentiment expressed in textual content, such as positive, negative, or neutral sentiments.

Combination of Techniques

  • Hybrid Approaches: Combining multiple techniques, such as text analysis, image recognition, and metadata examination, for a more comprehensive and accurate classification.

Deep Learning

  • Neural Networks: Leveraging deep learning models, such as convolutional neural networks (CNNs) for images or recurrent neural networks (RNNs) for sequential data, to automatically learn features and patterns for classification.

Feedback Loop and Continuous Improvement

  • Establishing a feedback loop where the classification system continuously learns and improves based on user feedback, corrections, and updates to the training data.

Unstructured data classification is a challenging task, but advancements in machine learning, deep learning, and natural language processing have significantly improved the accuracy and efficiency of these classification methods and modern unstructured data management software solutions have emerged to address elements of data classification and ongoing data lifecycle management.

Depending on the specific requirements and characteristics of the unstructured data, different techniques or a combination of approaches may be suitable for effective unstructured data classification.

Read the article: How to Control Unstructured Data

Komprise Use Case: Data Classification

Komprise-blog-storage-teams-using-deep-analytics-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Unstructured Data Management

Komprise-Analysis-Only-WP-graphic-1

What is Unstructured Data Management?

Unstructured data management is a category of software that has emerged to address the explosive growth of unstructured data in the enterprise and the modern reality of hybrid cloud storage. In the Komprise 2023 Komprise State of Unstructured Data Management, 32% of organizations report that they are managing 10PB of data or more. That equates to 110,000 ultra-high-definition (UHD) movies, or half of the data stored by the U.S. Library of Congress. Most (73%) of organizations are spending more than 30% of their IT budget on data storage.

Data storage and data backup technology vendors are now recognizing the importance of unstructured data management as data outlives infrastructure and as data mobility is needed to leverage cloud data storage.

Unstructured data management must be independent and agnostic from data storage, backup, and cloud infrastructure technology platforms.

There are 5 requirements for unstructured data management solutions:

  1. Goes Beyond Storage Efficiency
  2. Must be Multi-Directional
  3. Doesn’t Disrupt Users and Workflows
  4. Should Create New Uses for Your Data
  5. Puts Your Data First and Avoids Vendor Lock-In

An analytics-based unstructured data management solution brings value by analyzing all data in storage across on-premises and cloud environments to deliver deep insights. This knowledge helps IT managers make great decisions with users in mind, optimize costs and reduce security and regulatory compliance risks. These insights go beyond traditional storage metrics such as latency, IOPS and network throughput.

Here are some of the new metrics made possible with data management software:

  • Top data owners/users: See trends in usage and and possible compliance issues, such as individual users storing excessive video files or PII files being stored in an insecure location.
  • Common file types: The ability to see data by file extension eases the process of finding all files related to a project and can inform future research initiatives. This could be as simple as finding all the log files, trace files or extracts from a given application or instrument and moving them to a data lake for analysis.
  • Storage costs for chargeback or showback: Whether for chargeback requirements or not, stakeholders should understand costs in their department and be able to view metrics. This will help identify areas where low-cost storage or data tiering to archival storage is a viable cost-reduction opportunity.
  • Data growth rates: High level metrics on data growth keeps IT and business heads on the same page so they can collaborate on data management decisions. Understand which groups and projects are growing data the fastest and ensure that data creation/storage is appropriate according to its overall business priority.
  • Age of data and access patterns. In most enterprises, 60-80% of data is  “cold” and hasn’t been accessed in a year or moreMetrics showing percentage of cold versus warm versus hot data are critical to ensure that data is living in the right place at the right time according to its business value and to optimize costs.

Read: File Data Metrics to Live By

Beyond cost optimization, unstructured data management tools and practices can help deliver new value from data.

Unstructured data is the fuel needed for AI, yet its difficult to leverage because unstructured data is hard to find, search across, and move due to its size and distribution across hybrid cloud environments. Tagging and automation can help prepare unstructured data for AI and big data analytics programs. Tactics include:

  • Preprocess data at the edge so it can be analyzed and tagged with new metadata before moving it into a cloud data lake. This can drastically reduce the wasted cost and effort of moving and storing useless data and can minimize the occurrence of data swamps.
  • Applying automation to facilitate data segmentation, cleansing, search and enrichment. You can do this with data tagging, deletion or tiering of cold data by policy and moving data into the optimal storage where it can be ingested by big data and ML tools. A leading new approach to is the ability to initiate and execute data workflows.
  • Use a solution that persists metadata tags as data moves from one location to another. For instance, files tagged as containing key project keywords by a third-party AI service should retain those tags indefinitely so that a new research team doesn’t have to run the same analysis over again — at high cost. Komprise Intelligent Data Management has these capabilities.
  • Plan appropriately for large-scale data migration efforts with thorough diligence and testing. This can prevent common networking and security issues that delay data migrations and introduce errors or data loss.

The State of Unstructured Data Management

In August 2021, Komprise published the first State of Unstructured Data Management Report:

State-of-Unstructured-Data-management-Report-Thumbnail

Highlights of the 2021 Unstructured Data Management Report

Unstructured Data is Growing, as are its Costs

Data-Storage-Spend-Charts-1

  • 65.5% of organizations spend more than 30% of their IT budgets on data storage and data management.
  • Most (62.5%) will spend more on storage in 2021 versus 2020.
Getting More Unstructured Data to the Cloud is a Key Priority

Majority-of-Data-Stored-Chart-1

  • 50% of enterprises have data stored in a mix of on-premises and cloud-based storage.
  • Top priorities for cloud data management include: migrating data to the cloud (56%) cutting storage and data costs (46%) and governance and security of data in the cloud (41%).
IT Leaders Want Visibility First Before Investing in More Data Storage
  • Investing in analytics tools was the highest priority (45%) over buying more cloud or on-premises storage or modernizing backups.
  • One-third of enterprises acknowledge that over 50% of data is cold while 20% don’t know, suggesting a need to right-place data through its lifecycle.
Unstructured Data Management Goals & Challenges: Visibility, Cost Management and Data Lakes
  • 44.9% wish to avoid rising costs.
  • 44.5% want better visibility for planning.
  • 42% are interested in tagging data for future use and enabling data lakes.

Komprise-State-of-Unstructured-Data-Management-Report-SOCIAL-2-1

2022 State Unstructured Data Management Report

In August 2022, Komprise published the 2nd annual State of Unstructured Data Management Report: Komprise Survey Finds 65% of Enterprise IT Leaders are Investing in Unstructured Data Analytics. The Top 5 trends from the report are summarized here. They are:

  1. User Self-Service: In data management, self-service typically refers to the ability for authorized users outside of storage disciplines to search, tag and enrich and act on data through automation—such as a research scientist wanting to continuously export project files to a cloud analytics service.
  2. Moving Data to Analytics Platforms: A majority (65%) of organizations plan to or are already delivering unstructured data to their big data analytics platforms.
  3. Cloud File Storage Gains Favor: Cloud NAS topped the list for storage investments in the next year (47%).
  4. User Expectations Beg Attention: Organizations want to move data without disrupting users and applications (42%).
  5. IT and Storage Directors want Flexibility: A top goal for unstructured data management (42%) is to adopt new storage and cloud technologies without incurring extra licensing penalties and costs, such as cloud egress fees.
Komprise-State-of-Unstructured-Data-Management-Report-2022-BLOG-SOCIAL-1
Unstructured Data Management

State of Unstructured Data Management 2023

In September 2023, Komprise published the 3rd annual State of Unstructured Data Management report.

The coverage focused on the fact that 66% of respondents said preparing data storage and data management for AI and GenerativeAI in general is a top priority and challenge.

Komprise-2023-State-of-Unstructured-Data-Management_-Linkedin-Social-1200px-x-628px

Why you need to manage your unstructured data?

In a 2022 interview, Komprise co-founder and COO Krishna Subramanian defined unstructured data this way:

Unstructured data is any data that doesn’t fit neatly into a database, and isn’t really structured in rows and columns. So every photo on your phone, every X-ray, every MRI scan, every genome sequence, all the data generated by self-driving cars – all of that is unstructured data. And perhaps more relevant to more businesses, artificial intelligence (AI) and machine learning (ML) – they depend on, and usually output, unstructured data too.

Unstructured data is growing every day at a truly astonishing rate. Today, 85% of the world’s data is unstructured data.

And it’s more than doubling, every two years.

The importance of an unstructured data strategy for enterprise

In part two of the interview, Krishna Subramanian noted:

Unstructured data doesn’t have a common structure. But it does have something called metadata. So every time you take a picture on your phone, there’s certain information that the phone captures, like the time of day, the location where the picture was taken, and if you tag it as a favorite, it’ll have that metadata tag on it too. It might know who’s in the photo, there are certain metadata that are kept.

All filing systems store some metadata about the data. A product like Komprise Intelligent Data Management has a distributed way to search across all the different environments where you’ve stored data, and create a global index of all that metadata around the data. And that in itself is a difficult problem, because again, unstructured data is so huge. A petabyte of data might be a few billion files, and a lot of these customers are dealing with tens to hundreds of petabytes.

So you need a system that can create an efficient index of hundreds of billions of files that could be distributed in different places. You can’t use a database, you have to have a distributed index, and that’s the technology we use under the hood, but we optimize it for this use case. So you create a global index. Learn more about unstructured data tagging.

The Future of Unstructured Data Management

In an end of the year blog post, Komprise executives review unstructured data management and data storage predictions for 2023 and the implications of adopting data services, processing data at the edge, multi-cloud challenges, the importance of getting smart data migration strategies, and more.

Getting Started with Komprise:

Want To Learn More?

Unstructured Data Migration

Data-Migration-Icon-603x603What is Unstructured Data Migration?

Unstructured Data Migration is the process of selecting and moving data from one location to another – this may involve moving data across different storage vendors, and across different formats.

Data migrations are often done in the context of retiring a system and moving to a new system, or in the context of a cloud migration, or in the context of a modernization or upgrade strategy.

When it comes to unstructured data migrations and migrating enterprise file data workloads to the cloud, data migrations can be laborious, error prone, manual, and time consuming. Migrating data may involve finding and moving billions of files (large and small), which can succumb to storage and network slowdowns or outages. Also, different file systems do not often preserve metadata in exactly the same way, so migrating data without loss of fidelity and integrity can be a challenge.

NAS Data Migration

Network Attached Storage (NAS) migration is the process of migrating from one NAS storage environment to another. This may involve migrations within a vendor’s ecosystem such as NetApp data migration to NetApp or across vendors such as NetApp data migration to Isilon or EMC to NetApp or EMC to Pure FlashBlade. A high-fidelity NAS migration solution should preserve not only the file itself but all of its associated metadata and access controls.

Network Attached Storage (NAS) to Cloud data migration is the process of moving data from an on-premises data center to a cloud. It requires data to be moved from a file format (NFS or SMB) to an Object/Cloud format such as S3. A high-fidelity NAS-to-Cloud migration solution preserves all the file metadata including access control and privileges in the cloud. This enables data to be used either as objects or as files in the cloud.

Storage migration is a general-purpose term that applies to moving data across storage arrays.

Unstructured Data Migration Phases

Data migrations typically involve four phases:

  • Planning – Deciding what data should be migrated. Planning may often involve analyzing various sources to find the right data sets. For example, several customers today are interested in upgrading some data to Flash – finding hot, active data to migrate to Flash can be a useful planning exercise.
  • Initial Migration – Do a first migration of all the data. This should involve migrating the files, the directories and the shares.
  • Iterative Migrations – Look for any changes that may have occurred during the initial migration and copy those over.
  • Final Cutoff – A final cutoff involves deleting data at the original storage and managing the mounts, etc., so data can be accessed from the new location going forward.

Resilient data migration refers to an approach that automatically adjusts for failures and slowdowns and retries as needed. It also checks the integrity of the data at the destination to ensure full fidelity.

Types of Unstructured Data Migrations

When it comes to file data, there are NAS Migrations and Cloud Migrations. There are also NAS migrations to the cloud. Data migrations are often seen as a dreaded and laborious part of the storage management lifecycle. Free tools are often considered first but they can introduce risk, time and cost overruns and they are typically labor intensive and error-prone. On the other hand, traditional migration tools have complex legacy architectures and are expensive point products that do not provide ongoing value – resulting in sunk costs.

Look for easy-to-use, fast, reliable data migration tools are not one-and-done point tools. The right data migration solution should be able to handle other unstructured data management use cases, including cloud data tiering and data replication.

How to Plan a Smart NAS or Cloud Unstructured Data Migration?

The typical steps for any unstructured data migration project are:

Analytics: Before you start an unstructured data migration project, it’s important to have visibility into:

  •  How fast is your data growing?
  •  How much data is hot vs. cold data?
  •  Who is using your data?

Savings: Estimate how much you’ll save by moving to the new NAS or cloud infrastructure. This information will guide which NAS or cloud storage mix is best for your data.

Offload heavy lifting: Your data migration solution should be able to manage multiple iterations of the migration and handle problems by automatically retrying in a
slowdown or a network or storage failure.

Preserve data integrity: Your data migration solution should provide MD5 checksum on every file and assure all metadata and access controls migrate to the new environment.

Avoid sunk costs: File data migrations are a lot of heaving lifting. Your data migration solution should include automatic parallelization at every level for elastic
scaling and the ability to migrate petabytes of data seamlessly and reliably.

Reduce downtime: It is recommended that your data migration solution runs multiple iterations for more efficient cutovers.

Planning Your Cloud File Migration

Komprise and Unstructured Data Migration

Komprise Elastic Data Migration is included in the Komprise Intelligent Data Management platform or is available standalone. Designed for cloud migrations and NAS migrations, with Komprise Elastic Data Migration you can run, monitor, and manage hundreds of data migrations faster than ever at a fraction of the cost. Learn more about Komprise Smart Data Migrations.

Smart-Data-Migration-Blog-SOCIAL-768x402

Unstructured Data Migration and the Cloud

As unstructured data continues to grow exponentially, organizations struggle to control costs for file data storage. Many are turning to the cloud to scale and manage spend. However, choosing the right files to move can be challenging as there can easily be billions of files. Many enterprises have over 1 PB of data, which represents roughly 3 billion files. This unstructured data is growing exponentially and resides in multi-vendor storage silos for access by various applications and departments.
For these reasons, organizations often lack visibility into file data and are making decisions in the dark. To be agile and competitive, IT teams must evolve storage management to become a holistic data management strategy. The right approach to data migration and the cloud for file and object data is to use analytics in cloud data management:

  1. Understand your data patterns
  2. Plan using a cost model
  3. Use data to drive stakeholder buy-in
  4. Eliminate user disruption
  5. Create a systematic plan for ongoing data management

Read the eBook: 5 Ways to Use Analytics for Cloud Data Migrations

Top Unstructured Data Migration Challenges

Businesses today are looking at modernizing storage and moving to a multi-cloud strategy. As they evolve to faster, flash-based Network Attached Storage (NAS) and the cloud, migrating data into these environments can be challenging. The goal is to migrate large production data sets quickly, without errors, and without disruption to user productivity.

The top cloud data migration challenges are:

  1. How do you manage cloud data migrations without downtime?
  2. How can you automate cloud data migrations to eliminate manual effort?
  3. How can you ensure all the permissions, ACLs, metadata are copied correctly during a cloud data migration so you can access the data in the cloud as files?

You can overcome these challenges with some planning and automation that preserves file-based access both from on-premises and the cloud.

Unstructured Data Migration Tools

  • Free Tools: Require a lot of babysitting and are not reliable for migrating large volumes of data.
  • Point Data Migration Solutions: Have complex legacy architectures and create sunk costs.
  • Komprise Elastic Data Migration: Makes cloud data migrations simple, fast, reliable and eliminates sunk costs since you continue to use Komprise after the migration. Komprise is the only solution that gives you the option to cut 70%+ cloud storage costs by placing cold data in Object classes while maintaining file metadata so it can be promoted in the cloud as files when needed. Learn more >

Learn more about Smart Data Migrations for unstructured file and object data:

Part 1

Part 2

PttC_pagebanner-2048x639

What is Data Migration?

Data Migration is the process of selecting and moving data from one location to another and can involve moving data across different storage vendors, and across different formats.

How is data migration done?

Data migrations are often done in the context of retiring a system and moving to a new system, or in the context of a cloud migration, or in the context of a modernization or upgrade strategy.

What tools to use for data migration?

There are a variety of free tools but these require the most babysitting. Point Data Migration solutions have complex legacy architectures and can create sunk costs. Komprise Elastic Data Migration makes cloud data migrations simple, fast and reliable and eliminates sunk costs.

Getting Started with Komprise:

Want To Learn More?

Unstructured Data Storage

Unstructured data storage is the storage of data that does not adhere to a predefined data model or schema. Unlike structured data, which fits neatly into tables with rows and columns, unstructured data lacks a specific organization and may include various file types, such as text documents, images, videos, audio files, emails, social media posts, and more.

Read the article: Here’s How to Take Control of Unstructured Data

Gartner on unstructured data storage

Gartner-Logo

Each year Gartner publishes the Magic Quadrant for Distributed File Systems and Object Storage.

Gartner defines distributed file systems and object storage as software and hardware appliance products that offer object and distributed file system technologies for unstructured data. Their purpose is to store, secure, protect and scale unstructured data with access over the network using file and object protocols, such as Amazon Simple Storage Service (S3), Network File System (NFS) and Server Message Block (SMB).

Gartner also has a Primary Data Storage Magic Quadrant, as summarized in this Blocks & Files article.

Common requirements for unstructured data storage

  • Flexibility: Unstructured data storage systems are flexible and can accommodate various types of data without requiring predefined schemas. This flexibility allows organizations to store and manage diverse data types efficiently.
  • Scalability: Unstructured data storage solutions are often designed to scale easily, allowing organizations to handle massive volumes of data as their storage requirements grow over time.
  • Indexing and Search: Effective management of unstructured data involves indexing and search capabilities to quickly locate and retrieve specific information within large datasets. This may involve metadata tagging, full-text search, and other techniques to facilitate data discovery. See unstructured data classification.
  • Object Storage: Object storage is a common approach to storing unstructured data, where each piece of data is stored as an object with a unique identifier and metadata. Object storage systems provide scalability, durability, and accessibility for large-scale unstructured data environments.
  • Cloud Storage: Many organizations leverage cloud storage services for unstructured data storage due to their scalability, reliability, and cost-effectiveness. Cloud providers offer a range of storage options, including object storage, file storage, and content delivery networks (CDNs), to accommodate different types of unstructured data.
  • Data Governance and Security: Managing unstructured data requires robust data governance practices to ensure compliance, data security, and privacy protection. This may involve implementing access controls, encryption, data classification, and audit trails to safeguard sensitive information.

Effective storage and unstructured data management are essential for organizations to derive insights, make data-driven decisions, and unlock the value of their data assets.

Unstructured Data Storage Vendors

Many vendors offer solutions for storing unstructured data, each with its own set of features, capabilities, and pricing models. Here are some notable vendors in the unstructured data storage space:

  • Amazon Web Services (AWS): Amazon Simple Storage Service (S3) (AWS S3) is a highly scalable object storage service designed for storing and retrieving any amount of data. It is commonly used for unstructured data storage and offers features such as versioning, lifecycle management, and security features. Learn more about Komprise for AWS.
  • Microsoft Azure: Azure Blob Storage provides scalable, cost-effective storage for unstructured data. It offers tiered storage options, access controls, and integration with other Azure services for data analytics and processing. Learn more about Komprise for Azure.
  • Google Cloud Platform (GCP): Google Cloud Storage is a scalable object storage solution suitable for storing unstructured data. It provides features such as versioning, lifecycle management, and integration with other GCP services. Learn more about Komprise for Google. 
  • IBM: IBM Cloud Object Storage: IBM offers Cloud Object Storage, a scalable, secure, and durable object storage service. It is designed to support large-scale unstructured data storage and offers features such as encryption, access controls, and global data distribution. Learn more about Komprise for IBM.
  • Dell: Dell EMC Isilon, now Dell PowerScale, is a scale-out network-attached storage (NAS) platform designed for storing and managing large volumes of unstructured data. It offers high performance, scalability, and multi-protocol support for various data types. Learn about Komprise Elastic Data Migration for Isilon.
  • NetApp: NetApp StorageGRID is an object storage solution from NetApp that enables organizations to store, manage, and protect unstructured data at scale. It offers features such as geo-distribution, data tiering, and policy-based management. Learn more about Komprise for NetApp.
  • Pure Storage: Pure Storage FlashBlade is a scalable, all-flash storage platform designed for unstructured data workloads. It offers high performance, simplicity, and native support for file, object, and analytics workloads.

Komprise-Pure-Storage-Blog_Resource_Thumbnail_800x533

HPE (Hewlett Packard Enterprise): For years it has been HPE Nimble Storage, which offers a range of storage solutions, including Nimble Storage dHCI and Nimble Storage All Flash Arrays, suitable for storing unstructured data. HPE now resells VAST Data solutions as HPE File Services.

Qumulo: Qumulo’s Scale Anywhere™ platform is a 100% software solution for hybrid enterprises to efficiently store and manage file & object data at the edge, in the core, and in the cloud

These are some examples of vendors providing solutions for unstructured data storage.

Optimize unstructured data storage with Komprise

Komprise Intelligent Data Management frees you to analyze, mobilize, and access the right file and object data across clouds without shackling your data to any unstructured data storage vendor. Komprise helps enterprise customers optimize data storage costs by right-sizing and right-placing
data, while making it easy for users to unlock data value with smart data workflows.

Komprise-Architecture-Page-SOCIAL

Getting Started with Komprise:

Want To Learn More?

Unstructured Data Workflows

Unstructured data workflows can include a variety of processes and technologies, such as data management tools, document management systems, content management systems, and collaboration platforms. Data is no longer static and needs to move between systems and clouds to satisfy changing requirements and to support big data and AI/ML initiatives. Technologies and processes that automate and streamline these workflows can shave significant time and costs from finding, preparing and moving data into data lakes and analytics platforms or to meet compliance requirements.

Overall, unstructured data workflows play an important role in modern data management and are critical for organizations that generate and use large volumes of unstructured data. By implementing effective unstructured data workflows, organizations can ensure that data lives at the right place and at the right time to satisfy a variety of enterprise and departmental needs.

Komprise Smart Data Workflows for File and Object Data

BLOG-Smart-Data-Workflows-Architecture-Overview-CFD14-SOCIAL

Komprise Smart Data Workflows allow you to define and execute automated processes, which could be industry or domain specific, to search and fine, migrate and tier, and ultimately get greater value from unstructured data. With Smart Data Workflows you can create custom queries across hybrid, multi-cloud, on-premises and edge data silos to find the file and object data you need, which is often locked away in data storage silos, execute Komprise or external functions on a subset of data and tag the data with additional metadata. ​Move only the data you need and manage the lifecycle of unstructured data intelligently.

Watch the unstructured data management workflow with Komprise CTO and co-founder at Cloud Field Day.

Getting Started with Komprise:

Want To Learn More?

Virtual Data Lakes

A virtual data lake, for Komprise called the Global File Index, is a granular, flexible and searchable index across file, object and cloud data storage spanning petabytes of unstructured data. A virtual data lake has been called a metadata lake, allowing organizations to find and execute Smart Data Workflows that enable Big Data, AI, and ML projects.

Research has shown that with Big Data projects, up to 80% or more time is spent on finding the right data and getting it out of data centers and cloud infrastructure. With Komprise, powerful metadata-based search and indexing technology automates the process of finding unstructured data based on your specific criteria. This capability allows organizations to dynamically build virtual data lakes across storage silos on the fly so they can better manage and reuse your data for AI and ML.

Komprise Deep Analytics lets you build specific queries to find the files you need, tag it to build real-time virtual data lakes that the entire organization can use, without having to first move the data.

Learn more about Komprise Deep Analytics.

Getting Started with Komprise:

Want To Learn More?

Contact | Data Assessment