• A
    • Adaptive Data Management

      As data footprint continues to grow, businesses are struggling to manage petabytes of data, often consisting of billions and billions of files. To manage at this scale, intelligent automation that learns and adapts to your environment is needed.

      Data management needs to happen continuously in the background and not interfere with active usage of storage or the network by users and applications. This is because data management is an ongoing function, much like a housekeeper of data. Just as you would not want your housekeeper to be clearing dishes as your family is eating at the dinner table, data management needs to run non-intrusively in the background.

      To do this, an adaptive solution is needed – one that knows when your file system and network are in active use and throttles itself back, and then speeds back up when resources are available. An adaptive data management system learns from your usage patterns and adapts to the environment.

      Getting Started with Komprise:

    • AI Compute

      The computing ability required for machines to learn from big data to experience, adjust to new inputs, and perform human-like tasks. Komprise cuts the data preparation time for AI projects by creating virtual data lakes with its Deep Analytics feature.

      Getting Started with Komprise:

    • Archival Storage

      Archival Storage is a source for data that is not needed for an organization’s everyday operations, but may have to be accessed occasionally.

      By utilizing an archival storage, organizations can leverage to secondary sources, while still maintaining the protection of the data.

      Utilizing archival storage sources reduces primary storage costs required and allows an organization to maintain data that may be required for regulatory or other requirements.

      Data archiving, also known as data tiering, is intended to protect older information that is not needed for everyday operations, but may have to be accessed occasionally. Data Archival and Tiering storage is a tool for reducing your primary storage need and the related costs, rather than acting as a data recovery tool.

      Why Archival Storage?

      • Some data archives allow data to be read-only to protect it from modification, while other data archiving products treat data as to allow users to modify it.
      • The benefit of data archiving is that it reduces the cost of primary storage. Alternatively, archive storage costs less because it is typically based on a low-performance, high-capacity storage medium.
      • Data archiving takes a number of different forms. Options can be online data storage, which places archive data onto disk systems where it is readily accessible. Archives are frequently file-based, but object storage is also growing in popularity. A key challenge when using object storage to archive file-based data is the impact it can have on users and applications. To avoid changing paradigms from file to object and breaking user and application access, use data management solutions that provide a file interface to data that is archived as objects.
      • Another archival system uses offline data storage where archive data is written to tape or other removable media using data archiving software rather than being kept online. Data archiving on tape consumes less power than disk systems, translating to lower costs.
      • A third option is using cloud storage, such as those offered by Amazon – this is inexpensive but requires ongoing investment.
      • The data archiving process typically uses automated software, which will automatically move “cold” data via policies set by an administrator. Today, a popular approach to data archiving is to make the archive “transparent” – so the archived data is not only online but the archived data is fully accessed exactly as before by users and applications, so they experience no change in behavior. The patented Komprise Transparent Move Technology is designed to allow you to transparently archive and tier data.

      Getting Started with Komprise:

    • Analytics-driven Data Management

      The proprietary platform of Intelligent Komprise Data Management that’s based on data insight and automation to strategically and efficiently manage unstructured data at massive scale.

      Getting Started with Komprise:

    • Amazon Glacier (AWS Glacier)

      Arctic glacier

      What is Amazon Glacier (AWS Glacier)?

      Amazon Glacier, also known as AWS Glacier, is a class of cloud storage available through Amazon Web Services (AWS). Glacier is a lower-cost storage tier designed for use with data archiving and long-term backup services on the public cloud infrastructure.

      Amazon S3 Glacier was created to house data that doesn’t need to be accessed frequently or quickly. This makes it ideal for use as a cold storage service, hence the inspiration for its name.

      AWS Glacier retrieval times range from a few minutes to a few hours with three different speed options available: Expedited (1-5 minutes), Standard (3-5 hours), and Bulk (5-12 hours).

      AWS Glacier Deep Archive offers 12-48-hour retrieval times. The faster retrieval options are significantly more expensive, so having your data organized into the correct tier within AWS cloud storage is an important aspect of keeping storage costs down.

      Other Glacier features:

      • The ability to store an unlimited number of objects and data
      • Data stored in Glacier is dispersed across multiple geographically separated Availability Zones within the AWS region
      • An average annual durability of 99.999999999%
      • Checksum uploads to validate data authenticity
      • REST-based web service
      • Vault, Archive, and Job data models
      • Limit of 1,000 vaults per AWS account

      Main Applications for AWS Glacier Storage

      There are several scenarios where Glacier is an ideal solution for companies needing a large volume of cloud storage.

      1. Huge data sets. Many companies that perform trend or scientific analysis need a huge amount of storage to be able to house their training, input, and output data for future use.
      2. Replacing legacy storage infrastructure. With the many advantages that cloud-based storage environments have over traditional storage infrastructure, many corporations are opting to use AWS storage to get more out of their data storage systems. AWS Glacier is often used as a replacement for long term tape archives.
      3. Healthcare facilities’ patient data. Patient data needs to be kept for regulatory or compliance requirements. Glacier and Glacier Deep Archive are ideal archiving platforms to keep data that will hardly need to be accessed.
      4. Cold data with long retention times. Finance, Research, Genomics, and Electronic Design Automation and Media, Entertainment are some examples of industries where cold data and inactive projects may need to be retained for long periods of time even though they are not actively used.  AWS Glacier storage classes are a good fit for these types of data.  The project data will need to be recalled before it is actively used to minimize retrieval delays and costs.

      Amazon Glacier vs S3

      Amazon’s S3 storage and Glacier are different classes of storage designed to handle workloads on the AWS cloud storage platform.

      • Glacier is best for cold data that’s rarely or never accessed
      • AWS S3 storage is intended for hot and warm data that needs to be accessed daily and quickly

      The speed and accessibility of S3 storage comes at a much higher cost compared to Glacier and the even more economical Glacier Deep Archive storage tiers. Having the right data management solution is critical to help you identify and organize your hot and cold data into the correct storage tiers, saving a substantial amount on storage costs.

      Benefits of a Data Management System to Optimize AWS Glacier

      A comprehensive suite of data management capabilities allows organizations to reduce their storage footprint and substantially cut their storage costs. These are a few of the benefits of integrating an analytics-driven data management solution with your AWS storage:

      • Get full visibility of your data across AWS and other cloud platforms to understand how much NAS data is being accrued and whether it’s hot or cold.
      • Intelligent tiering and lifecycle management of files and objects across EFS, FSX, S3 and Glacier storage classes based on access patterns
      • Intelligent retrievals so you don’t get hit with unexpected data retrieval fees on Glacier – Komprise enables intelligent recalls based on access patterns so if an object on Glacier becomes active again, Komprise will move it up to an S3 storage class.
      • Bulk retrievals for improved user performance of entire projects from Glacier storage classes – if an archived project is going to become active, you can prefetch and retrieve the entire project from Glacier using Komprise so users don’t have to face long latencies to get access to the data they need.
      • Minimize costs with analytics-driven solution that monitors retrieval costs, egress costs and other costs to minimize them by promoting data up and recalling it intelligently to more active storage classes
      • Access data that has been moved across AWS as objects from S3 storage classes or as files from File and NAS storage classes without the need for additional stubs or agents.
      • Reduce the complexity of your cloud storage and NAS environment and manage your data more easily through an intuitive dashboard.
      • Optimize the savings of your AWS storage environment with analytics-driven management of all the complex storage, retrieval, egress and other costs
      • Easy, on-demand scalability with capacity to add and manage petabytes without limits or the need for dedicated infrastructure.
      • Integrate data lifecycle management easily with an AWS Advanced Tier partner such as Komprise.
      • Move data transparently to any tier within AWS so users experience no difference in data access
      • Create automated policies to continuously manage the lifecycle of the moved data for maximum savings.

      Streamline AWS Glacier Operations with Komprise Intelligent Data Management

      Komprise’s Intelligent Data Management allows you to seamlessly analyze and manage data across all of your AWS cloud storage classes so you can move data across file, S3 and Glacier storage classes at the right time for the best price/performance. Because it’s vendor agnostic, its standards-driven analytics and data management work with  the largest storage providers in the industry and have helped companies save up to 50% on their cloud storage costs.

      If you’re looking to get more out of your AWS storage, then contact a data management expert at Komprise today and see how much you could save.

      Try free today

      Getting Started with Komprise:

    • Amazon (AWS) S3 Intelligent Tiering

      S3 Intelligent Tiering is an Amazon storage class aimed at data with unknown or unpredictable data access patterns. See our S3 Intelligent Tiering glossary entry for further information.

      Getting Started with Komprise:

    • Amazon S3 (AWS S3)

      Amazon Simple Storage Service, known as Amazon S3 or AWS S3, is an object storage service that offers industry-leading scalability, data availability, security, and performance.

      See S3 in our glossary for further information.

      Getting Started with Komprise:

  • C
    • Capacity Planning

      Capacity planning is the estimation of space, hardware, software, and connection infrastructure resources that will be needed a period of time. In reference to the enterprise environment, there is a common concern over whether or not there will be enough resources in place to handle an increasing number of users or interactions. The purpose of capacity planning is to have enough resources available to meet the anticipated need, at the right time, without accumulating unused resources. The goal is to match the resource of availability to the forecasted need, in the most cost-efficient manner.

      True data capacity planning means being able to look into the future and estimate future IT needs and efficiently plan where data is stored and how it is managed based on the SLA of the data. Not only must you meet the future business needs of fast-growing data, you must also stay within the organization’s tight IT budgets. And, as organizations are looking to reduce operational costs with the cloud, deciding what data can move to the cloud, and how to leverage the cloud without disrupting existing file-based users and applications becomes critical.

      Data storage never shrinks, it just relentlessly gets bigger. Regardless of industry, organization size, or “software-defined” ecosystem, it is a constant stress-inducing challenge to stay ahead of the storage consumption rate. That challenge is not made any easier considering that typically organizations waste a staggering amount of data storage capacity, much of which can be attributed to improper capacity management.

      Are you making capacity planning decisions without insight?

      Komprise enables you to intelligently plan storage capacity, offset additional purchase of expensive storage, and extend the life of your existing storage by providing visibility across your storage with key analytics on how data is growing and being used, and interactive what-if analysis on the ROI of using different data management objectives. Komprise moves data based on your objectives to secondary storage, object storage or cloud storage, of your choice while providing a file gateway for users and applications to transparently access the data exactly as before.

      With an analytics-first approach, Komprise provides visibility into how data is growing and being used across storage silos. Storage administrators and IT leaders no longer have to make storage capacity planning decisions without insight. With Komprise Intelligent Data Management, you’ll understand how much more storage will be needed, when and how to streamline purchases during planning.

      Getting Started with Komprise:

    • Checksum

      A calculated value that’s used to determine the integrity of data. The most commonly used checksum is MD5, which Komprise uses.

      Getting Started with Komprise:

    • Cloud Data Growth Analytics

      70% of data is most enterprise organizations is cold and has not been accessed in months, yet it sits on expensive storage and consumes the same backup resources as hot data.

      50% of the 175 zettabytes of data worldwide in 2025 will be stored in public cloud environments. (IDC)

      80% of businesses will overspend their cloud infrastructure budgets, according to due to a lack of cloud cost optimization. (Gartner)

      Komprise provides the visibility and analytics into cloud data that lets organizations understand data growth across their clouds and helps move cold data to optimize costs.

      Getting Started with Komprise:

    • Cloud Data Management

      Diagram showing cloud data management

      What is Cloud Data Management?

      Cloud data management is a way to manage data across cloud platforms, either with or instead of on-premises storage. The goal is to curb rising cloud storage costs, but it can be quite a complicated pursuit, which is why most businesses employ an external company offering cloud data management services.

      Cloud data management is emerging as an alternative to data management using traditional on-premises software. The benefit of employing a top cloud data management company means that instead of buying on-premises storage resources and managing them, resources are bought on-demand in the cloud. This service model allows organizations to receive dedicated cloud data management resources on an as-needed basis. Cloud data management also involves finding the right data from on-premises storage and moving this data through data archiving, data tiering, data replication and data protection, or data migration to the cloud.

      Advantages of Cloud Data Management

      Optimal cloud data management provides four key capabilities that help to reduce cloud storage costs:

      1. Gain Accurate Visibility Across Cloud Accounts into Actual Usage
      2. Forecast Savings and Plan Data Management Strategies
      3. Archive Based on Actual Data Usage to Avoid Surprises
        • Using last-accessed time vs. last modified provides a more predictable decision on the objects that will be accessed in the future, which avoids costly archiving errors.
      4. Radically Simplify Migrations
        • Easily pick your source and destination
        • Run dozens or hundreds of migrations in parallel
        • Reduce the babysitting

      The many benefits of cloud data management include speeding up technology deployment and reducing system maintenance costs; it can also provide increased flexibility to help meet changing business requirements.

      Challenges Faced with Enterprise Cloud Data Management

      But, like other cloud computing technologies, enterprise cloud data management services can introduce challenges – for example, data security concerns related to sending sensitive business data outside the corporate firewall for storage. Another challenge is the disruption to existing users and applications who may be using file-based applications on premise since the cloud is predominantly object based.

      Cloud data management solutions should provide you with options to eliminate this disruption by transparently managing data across common formats such as file and object.

      Find out about Komprise Intelligent Data Management for Multi-cloud

      Features of a Cloud Data Management Platform

      Some common features and capabilities cloud data management solutions should deliver:

      • Data Analytics: Can you get a view of all your cloud data, how it’s being used, and how much it’s costing you? Can you get visibility into on-premises data that you wish to migrate to the cloud? Can you understand where your costs are so you know what to do about them?
      • Planning and Forecasting: Can you set policies for how data should get moved either from one cloud storage class to another or from an on-premises storage to the cloud. Can you project your savings? Does this account for hidden fees like retrieval and egress costs?
      • Policy based data archiving, data replication, and data management: How much babysitting do you have to do to move and manage data? Do you have to tell the system every time something needs to be moved or does it have policy based intelligent automation?
      • Fast Reliable Cloud Data Migration: Does the system support migrating on-premises data to the cloud? Does it handle going over a Wide Area Network? Does it handle your permissions and access controls and preserve security of data both while it’s moving the data and in the cloud?
      • Intelligent Cloud Archiving, Intelligent Tiering and Data Lifecycle Management: Does the solution enable you to manage ongoing data lifecycle in the cloud? Does it support the different cloud storage classes (eg High-performance options like File and Cloud NAS and cost-efficient options like Amazon S3 and Glacier)?

      In practice, the design and architecture of a cloud varies among cloud providers. Service Level Agreements (SLA) represent the contract which captures the agreed upon guarantees between a service provider and its customers.

      It is important to consider that cloud administrators are responsible for factoring:

      • Multiple billable dimensions and costs: storage, access, retrievals, API, transitions, initial transfer, and minimal storage-time costs
      • Unexpected costs of moving data across different storage classes. Unless access is continually monitored and data is moved back up when it gets hot, you’ll face expensive retrieval fees.

      This complexity is the reason why only a mere 20% of organizations are leveraging the cost-saving options available to them in the cloud.

      How do Cloud Data Management Tools work?

      As more enterprise data runs on public cloud infrastructure, many different types of tools and approaches to cloud data management have emerged. The initial focus has been on migrating and managing structured data in the cloud. Cloud data integration, ETL (extraction, transformation and loading), and iPaaS (integration platform as a service) tools are designed to move and manage enterprise applications and databases in the cloud. These tools typically move and manage bulk or batch data or real time data.

      Cloud-based analytics and cloud data warehousing have emerged for analyzing and managing hybrid and multi-cloud structured data.

      In the world of unstructured data storage and backup technologies, cloud data management has been driven by the need for cost visibility, cost reduction, and optimizing cloud data. As file-level tiering has emerged as a critical component of an intelligent data management strategy and more file data is migrating to the cloud, cloud data management is evolving from cost management to automation and orchestration, governance and compliance, performance monitoring, and security.

      What are the challenges faced with Cloud Data Management security?

      Most of the cloud data management security concerns are related to general cloud computing security questions organizations face. It’s important to evaluate the strengths and security certifications of your cloud data management vendor as part of your overall cloud strategy

      Is adoption of Cloud Data Management services growing?

      As enterprise IT organizations are increasingly running hybrid, multi-cloud, and edge computing infrastructure, cloud data management services have emerged as a critical requirement. Look for solutions that are open, cross-platform, and ensure you always have native access to your data. Visibility across silos has become a critical need in the enterprise, but it’s equally important to ensure data does not get locked into a proprietary solution that will disrupt users, applications, and customers. The need for cloud native data access and data mobility should not be underestimated. In addition to visibility and access, cloud data management services must enable organizations to take the right action in order to move data to the right place and the right time. The right cloud data management solution will reduce storage, backup and cloud costs as well as ensure a maximum return on the potential value from all enterprise data.

      How is Enterprise Cloud Data Management different from Consumer Systems?

      While consumers need to manage cloud storage, it is usually a matter of capacity across personal storage and devices. Enterprise cloud data management involves IT organizations working closely with departments to build strategies and plans that will ensure unstructured data growth is managed and data is accessible and available to the right people at the right time.

      Enterprise IT organizations are increasingly adopting cloud data management solutions to understand how cloud (typically multi-cloud) data is growing and manage its lifecycle efficiently across all of their cloud file and object storage options.

      Get More from your Cloud Storage Solution with Cloud Data Management Services from Komprise

      • Get accurate analytics across clouds with a single view across all your users’ cloud accounts and buckets and save on storage costs with an analytics-driven approach.
      • Forecast cloud cost optimization by setting different data lifecycle policies based on your own cloud costs.
      • Establish policy-based multi-cloud lifecycle management by continuously moving objects by policy across storage classes transparently (e.g., Amazon Standard, Standard-IA, Glacier, Glacier Deep Archive).
      • Accelerate cloud data migrations with fast, efficient data migrations across clouds (e.g., AWS, Azure, Google and Wasabi) and even on-premises (ECS, IBM COS, Pure FlashBlade).
      • Deliver powerful cloud-to-cloud data replication by running, monitoring, and managing hundreds of migrations faster than ever at a fraction of the cost with Elastic Data Migration.
      • Keep your users happy with no retrieval fee surprises and no disruption to users and applications from making poor data movement decisions based on when the data was created.

      A cloud data management platform like Komprise, named a Gartner Peer Insights Awards leader, that is analytics-driven, can help you save 50% or more on your cloud storage costs.

      Komprise Cloud Data Management interface

      Getting Started with Komprise:

    • Cloud Data Migration

      What is Cloud Data Migration?

      Cloud data migration is the process of relocating either all or a part of an enterprise’s data to a cloud infrastructure. Cloud data migration is often the most difficult and time-consuming part of an overall cloud migration project. Other elements of cloud migration involve application migration and workflow migration.

       

      diagram to show cloud data migration process
      Cloud Data Migration Process

      Cost, Complexity and Time: Why Cloud Data Migrations are Difficult

      Cloud data migrations are usually the most laborious and time-consuming part of a cloud migration initiative. Why? Data is heavy – data footprints are often in hundreds of terabytes to petabytes and can involve billions of files and objects. Some key reasons why cloud data migrations fail include:

      • Lack of Proper Planning: Often cloud data migrations are done in an ad-hoc fashion without proper analytics on the data set and planning
      • Improper Choice of Cloud Storage Destination: Most public clouds offer many different classes and tiers of storage – each with their own costs and performance metrics. Also, many of the cloud storage classes have retrieval and egress costs, so picking the right cloud storage class for a data migration involves not just finding the right performance and price to store the data but also the right access costs. Intelligent tiering and Intelligent archiving techniques that span both cloud file and object storage classes are important to ensure the right data is in the right place at the right time.
      • Ensuring Data Integrity: Data migrations involve migrating the data along with migrating metadata. For a cloud data migration to succeed, not only should all the data be moved over with full fidelity, but all the access controls, permissions, and metadata should also move over. Often, this is not just about moving data but mapping these from one storage environment to another.
      • Downtime Impact: Cloud data migrations can often take weeks to months to complete. Clearly, you don’t want users to not be able to access the data the need for this entire time. Minimizing downtime, even during a cutover, is very important to reduce productivity impact.
      • Slow Networks, Failures: Often cloud data migrations are done over a Wide Area Network (WAN), which can have other data moving on it and hence deliver intermittent performance. Plus, there may be times when the network is down or the storage at either end is unavailable. Handling all these edge conditions is extremely important – you don’t want to be halfway through a month-long cloud data migration only to encounter a network failure and have to start all over again.
      • Time Consuming – Since cloud data migrations involve moving large amounts of data, they can often involve a lot of manual effort in managing the migrations. This is laborious, tedious and time consuming.
      • Sunk Costs: Cloud data migrations are often time-bound projects – once the data is migrated, the project is complete. So, if you invest in tools to address cloud data migrations, you may have sunk costs once the cloud data migration is complete.

      Cloud Data Migrations can be of Network Attached Storage (NAS) or File Data, or of Object data or of Block data. Of these, Cloud Data Migration of File Data and Cloud Data Migration of Object data are particularly difficult and time-consuming because file and object data are much larger in volume.

      To learn more about the seven reasons why cloud data migrations are dreaded, watch the webinar.

      Cloud Data Migration Strategies

      Different cloud data migration strategies are used depending on whether file data or object data need to be migrated. Common methods for moving these two types of data through cloud migration solutions are described in further detail below.

      Cloud Data Migration for File Data aka NAS Cloud Data Migrations

      File data is often stored on Network Attached Storage. File data is typically accessed over NFS and SMB protocols. File data can be particularly difficult to migrate because of its size, volume, and richness. File data often involves a mix of large and small files – data migration techniques often do better when migrating large files but fail when migrating small files. Data migration solutions need to address a mix of large and small files and handle both efficiently. File data is also voluminous – often involving billions of files. Reliable cloud data migration solutions for file data need to be able to handle such large volumes of data efficiently. File data is also very rich and has metadata, access control permissions and hierarchies. A good file data migration solution should preserve all the metadata, access controls and directory structures. Often, migrating file data involves mapping this information from one file storage format to another. Sometimes, file data may need to be migrated to an object store. In these situations, the file metadata needs to be preserved in the object store so the data can be restored as files at a later date. Techniques such as MD5 checksums are important to ensure the data integrity of file data migrations to the cloud.

      Cloud Data Migration for Object Data (S3 Data Migrations or Object-to-Cloud Data Migrations or Cloud-to-Cloud Data Migrations)

      Cloud data migrations of object data is relatively new but quickly gaining momentum as the majority of enterprises are moving to a multi-cloud architecture. The Amazon Simple Storage Service (S3) protocol has become a de-facto standard for object stores and public cloud providers. So most cloud data migrations of object data involve S3 based data migrations.

      There are 3 common use cases for cloud object data migrations:

      • Data migrations from an on-premises object store to the public cloud: Many enterprises have adopted an on-premises object storage Most of these object storage solutions follow the S3 protocol. Customers are now looking to analyze data on their on-premises object storage and migrate some or all of that data to a public cloud storage option such as Amazon S3 or Microsoft Azure Blob.
      • Cloud-to-cloud data migrations and cloud-to-cloud data replications: Enterprises looking to switch public cloud providers need to migrate data from one cloud to another. Sometimes, it may also be cost-effective to replicate across clouds as opposed to replicating within a cloud. This also improves data resiliency and provides enterprises with a multi-cloud strategy. Cloud-to-cloud data replication differs from cloud data migration because it is ongoing – as data changes on one cloud, it is copied or replicated to the second cloud.
      • S3 data migrations: This is a generic term that refers to any object or cloud data migration done using the S3 protocol. The Amazon Simple Storage Service (s3) protocol has become a de-facto standard. Any Object-to-Cloud, Cloud-to-Cloud or Cloud-to-Object migration can typically be classified as a S3 Data Migration.

      Secure Cloud Data Migration Tools

      Cloud data migrations can be performed by using free tools that require extensive manual involvement or commercial data migration solutions. Sometimes Cloud Storage Gateways are used to move data to the cloud, but these require heavy hardware and infrastructure setup. Cloud data management solutions offer a streamlined, cost-effective, software-based approach to manage cloud data migrations without requiring expensive hardware infrastructure and without creating data lock-in. Look for elastic data migration solutions that can dynamically scale to handle data migration workloads and adjust to your demands.

      Getting Started with Komprise:

    • Cloud Data Storage

      Cloud data storage is a service for individuals or organizations to store data through a cloud computing provider such as AWS, Azure, Google Cloud, IBM or Wasabi. Storing data in a cloud service eliminates the need to purchase and maintain data storage infrastructure, since infrastructure resides within the data centers of the cloud IaaS provider and is owned/managed by the provider. Many organizations are increasing data storage investments in the cloud for a variety of purposes including: backup, data replication and data protection, data tiering and archiving, data lakes for artificial intelligence (AI) and business intelligence (BI) projects, and to reduce their physical data center footprint. As with on-premises storage, you have different levels of data storage available in the cloud. You can segment data based on access tiers: for instance, hot and cold data storage.

      cloud tiering

      Types of Cloud Data Storage

      Cloud data storage can either be designed for personal data and collaboration or for enterprise data storage in the cloud. Examples of personal data cloud storage are Google Drive, Box and DropBox.

      Increasingly, corporate data storage in the cloud is gaining prominence – particularly around taking enterprise file data that was traditionally stored on Network Attached Storage (NAS) and moving that to the cloud.

      Cloud file storage and object storage are gaining adoption as they can store petabytes of unstructured data for enterprises cost-effectively.

      Enterprise Cloud Data Storage for Unstructured Data (Cloud File Data Storage and Cloud Object Data Storage)

      Enterprise unstructured data growth is exploding – whether its genomics data, video and media content, or log files or IoT data.  Unstructured data can be stored as files on file data storage or as objects on cost-efficient object storage.  Cloud storage providers are now offering a variety of file and object storage classes at different price points to accommodate unstructured data.  Amazon EFS, FSX, Azure Files are examples of cloud data storage for enterprise file data, and Amazon S3, Azure Blob and Amazon Glacier are examples of object storage.

      Advantages of Cloud Data Storage

      There are many benefits of investing in cloud data storage, particularly for unstructured data in the enterprise. Organizations gain access to unlimited resources, so they can scale data volumes as needed and decommission instances at the end of a project or when data is deleted or moved to another storage resource. Enterprise IT teams can also reduce dependence on hardware and have a more predictable storage budget. However, without proper cloud data management, cloud egress costs and other cloud costs are often cited as challenges.

      In summary, cloud data storage allows:

      • The opportunity to reduce capital expenses (CAPEX) of data center hardware along with savings in energy, facility space and staff hours spend maintaining and installing hardware.
      • Deliver vastly improved agility and scalability to support rapidly changing business needs and initiatives.
      • Develop an enterprise-wide data lake strategy that would otherwise be unaffordable.
      • Lower risks from storing important data on aging physical hardware.
      • Leverage cheaper cloud storage for archiving and tiering purposes, which can also reduce backup costs.

      Cloud Data Storage Challenges and Considerations

      • Cloud data storage can be costly if you need to frequently access the data for use outside of the cloud, due to egress fees charged by cloud providers.
      • Using cloud tiering methodologies from on-premises storage vendors may result in unexpected costs, due to the need for restoring data back to the storage appliance prior to use. Read the white paper Cloud Tiering: Storage-Based vs. Gateways vs. File-Based
      • Moving data between clouds is often difficult, because of data translation and data mobility issues with file objects. Each cloud provider uses different standards and formats for data storage.
      • Security can be a concern, especially in some highly regulated sectors such as healthcare, financial services and e-commerce. IT organizations will need to fully understand the risks and methods of storing and protecting data in the cloud.
      • The cloud creates another data silo for enterprise IT. When adding cloud storage to an organization’s storage ecosystem, IT will need to determine how to attain a central, holistic view of all storage and data assets.

      For these reasons, cloud optimization and cloud data management are essential components of an enterprise cloud data storage strategy. Komprise has strategic alliance partnerships with hybrid and cloud data storage technology leaders:

      Getting Started with Komprise:

    • Cloud Migration

      Step-by-step cloud migration process.

      Cloud migration refers to the movement of data, processes, and applications from on-premises storage or legacy infrastructure to cloud-based infrastructure for storage, application processing, data archiving and ongoing data lifecycle management. Komprise offers analytics-driven cloud migration tools that integrate with most leading cloud service providers, such as Google Cloud, Amazon AWS, Microsoft Azure, Wasabi, IBM Cloud and more.

      Benefits of Cloud Migration

      Migrating to the cloud can offer many advantages – lower operational costs, greater elasticity, and flexibility.   Migrating data to the cloud in a native format also ensures you can leverage the computational capabilities of the cloud and not just use it as a cheap storage tier.  When migrating to the cloud, you need to consider both the application as well as its data. While application footprints are generally small and relatively easier to migrate, cloud data migrations need careful planning and execution as data footprints can be large.  Cloud migration with Komprise allows you to:

      • Plan a data migration strategy using analytics before migration. A pre-migration analysis helps you identify which files need to be migrated, plan how to organize the data to maximize the efficiency of the migration process. It’s important to know how data is used and to determine how large and how old files are throughout the storage system. Since data footprints often reach billions of files, planning a migration is critical.
      • Improve scalability with Elastic Data Migration. Data migrations can be time consuming as they involve moving hundreds of terabytes to  petabytes of data.  Since storage that data is migrating from is usually still in use during the migration, the data migration solution needs to move data as fast as possible without slowing down user access to the source storage.  This requires a scalable architecture that can leverage the inherent parallelism of the data sets to migrate multiple data streams in parallel without overburdening any single source storage.  Komprise uses a patented elastic data migration architecture that maximizes parallelism while throttling back as needed to preserve source data storage performance.
      • Shrink cloud migration time. When compared to generic tools used across heterogeneous cloud and physical storage, Komprise cloud data migration is nearly 30x faster. Performance is maximized at every level with the auto parallelize feature, minimizing network usage and making migration over WAN more efficient.
      • Reduce ongoing cloud storage costs with intelligent tiering and data lifecycle management in the cloud. Migrating to the cloud can reduce the amount spent on IT needs, storage maintenance, and hardware upgrades as these are typically handled by the cloud provider. Most clouds provide multiple storage classes at different price points – Komprise intelligently moves data to the right storage class in the cloud based on your policy and performs ongoing data lifecycle management in the cloud to reduce storage cost.  Unlike cloud intelligent tiering classes, Komprise tiers across both S3 and Glacier storage classes so you get the best cost savings.
      • Simplify storage management. With a Komprise cloud migration, you can use a single solution across your multivendor storage and multicloud architectures. All you have to do is connect via open standards – pick the SMB, NFS, and S3 sources along with the appropriate destinations and Komprise handles the rest. You also get a dashboard to monitor and manage all of your migrations from one place.  No more sunk costs of point migration tools because Komprise provides ongoing data lifecycle management beyond the data migration.
      • Greater resource availability. Moving your data to the cloud allows it to be accessed from wherever users may be, making your it easier for international businesses to store and access their data from around the world. Komprise delivers native data access so you can directly access objects and files in the cloud without getting locked in to your NAS vendor—or even to Komprise.

      Cloud Data Migration Process

      The cloud data migration process can differ widely based on a company’s storage needs, business model, environment of current storage, and goals for the new cloud-based system. Below are the main steps involved in migrating to the cloud.

      Step 1 – Analyze Current Storage Environment and Create Migration Strategy

      A smooth migration to the cloud requires proper planning to ensure that all bases are covered before the migration begins. It’s important to understand why the move is beneficial and how to get the most out of the new cloud-based features before the process continues.

      Step 2 – Choose Your Cloud Deployment Environment

      After taking a thorough look at the current resource requirements across your storage system, you can choose who will be your cloud storage provider(s). At this stage, it’s decided which type of hardware the system will use, whether it’s used in a single or multi-cloud solution, and if the cloud solution will be public or private.

      Step 3 – Migrate Data and Applications to the Cloud

      Application workload migration to the cloud can be done through generic tools.  However, since data migration involves moving petabytes of data and billions of files, you need a data management software solution that can migrate data efficiently in a number of ways including through a public internet connection, a private internet connection, (LAN or a WAN), etc.

      Step 4 – Validate Data After Migration

      Once the migration is complete, the data within the cloud can be validated and production access to the storage system can be swapped from on-premises to the cloud.  Data validation often requires MD5 checksum on every file to ensure the integrity of the data is intact after migration.

      Komprise Cloud Data Migration

      With Elastic Data Migration from Komprise, you can affordably run and manage hundreds of migrations across many different platforms simultaneously. Gain access to a full suite of high-speed cloud migration tools from a single dashboard that takes on the heavy lifting of migrations, and moves your data nearly 30x faster than traditional available services—all without any access disruption to users or apps.

      Our team of cloud migration professionals with over two decades of experience developing efficient IT solutions have helped businesses around the world provide faster and smoother data migrations with total confidence and none of the headaches. Contact us to learn more about our cloud data migration solution or sign up for a free trial to see the benefits beyond data migration with our analytics-driven Intelligent Data Management solution.

      Getting Started with Komprise:

    • Cloud NAS

      Cloud NAS

      What is NAS?

      Network Attached Storage (NAS) refers to storage that can be accessed from different devices over a network. NAS environments have gained prominence for file-based workloads because they provide a hierarchical structure of directories and folders that makes it easier to organize and find files. Many enterprise applications today are file-based, and use files stored in a NAS as their data repositories.

      What is Cloud NAS?

      Cloud NAS is a relatively new term – it refers to a cloud-based storage solution to store and manage files. Cloud NAS or cloud file storage is gaining prominence in 2020 and several vendors have recently released cloud NAS offerings.

      Cloud NAS Access Protocols

      Cloud NAS storage is accessed via the Server Message Block (SMB) and Network File System (NFS) protocols.  On-premises NAS environments are also accessed via SMB and NFS.

      Why is Cloud NAS gaining in importance?

      While the cloud was initially used by DevOps teams for new cloud-native applications that were largely Object-based, the cloud is now seen as a major destination for core enterprise applications. These enterprise workloads are largely file-based, and so moving them to the cloud without rewriting the application means file-based workloads need to be able to run in the cloud.

      To address this need, both cloud vendors and third-party storage providers are now creating cloud based NAS offerings. Here are some examples of cloud NAS offerings:

      Cloud NAS Tiers

      Cloud NAS storage is often designed for high-performance file workloads and its high performance Flash tier can be very expensive.

      Many Cloud NAS offerings such as AWS EFS and NetApp CloudVolumes ONTAP do offer some less expensive file tiers – but putting data in these lower tiers requires some data management solution. As an example, the standard tier of AWS EFS is 10 times more expensive than the standard tier of AWS S3. Furthermore, when you use a Cloud NAS, you may also have to replicate and backup the data, which can often make it three times more expensive.  As this data becomes inactive and cold, it is very important to manage data lifecycle on Cloud NAS to ensure you are only paying for what you use and not for dormant cold data on expensive tiers.

      Intelligent Data Archiving and Intelligent Tiering for Cloud NAS

      An analytics-driven unstructured data management solution can help you get the right data onto your cloud NAS and keep your cloud NAS costs low by managing the data lifecycle with intelligent archiving and intelligent tiering.

      As an example, Komprise Intelligent Data Management for Multi-cloud does the following:

      • Analyzes your on-premises NAS data so you can pick the data sets you want to migrate to the cloud
      • Migrates on-premises NAS data to your cloud NAS with speed, reliability and efficiency
      • Analyzes data on your cloud NAS to show you how data is getting cold and inactive
      • Enables policy-based automation so you can decide when data should be archived and tiered from expensive Cloud NAS tiers to lower cost file or object classes
      • Monitors ongoing costs to ensure you avoid expensive retrieval fees when cold data becomes hot again
      • Eliminates expensive backup and DR costs of cold data on cloud NAS

      Getting Started with Komprise:

    • Cloud Storage Gateway

      A cloud storage gateway is a hardware or software appliance that serves as a bridge between local applications and remote cloud-based storage.

      A cloud storage gateway provides basic protocol translation and simple connectivity to allow incompatible technologies to communicate. The gateway may be hardware or a virtual machine (VM) image.

      The requirement for a gateway between cloud storage and enterprise applications became necessary because of the incompatibility between protocols used for public cloud technologies and legacy storage systems. Most public cloud providers rely on Internet protocols, usually a RESTful API over HTTP, rather than conventional storage area network (SAN) or network-attached storage (NAS) protocols.

      Gateways can also be used for archiving in the cloud. This pairs with automated storage tiering, in which data can be replicated between fast, local disk and cheaper cloud storage to balance space, cost, and data archiving requirements.

      The challenge with traditional cloud gateways which front the cloud with on-premise hardware and use the cloud like another storage silo is that the cloud is very expensive for hot data that tends to be frequently accessed, resulting in high retrieval costs. Read the blog post: Are Cloud Storage Gateways a Good Choice for Cloud Data Migrations?

      Cloud Storage Gateway for Cloud Data Migration

      Getting Started with Komprise:

    • Cloud Tiering

      Cloud tiering is increasingly becoming a critical capability in managing enterprise file workloads across the hybrid cloud. Cloud tiering and cloud archiving are techniques that offload less frequently used data, also known as cold data, from expensive on-premises file storage or Network Attached Storage (NAS) to cheaper levels of storage in the cloud, typically object storage classes such as Amazon S3.  Cloud tiering is a variant of data tiering. The term “data tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems.

      Cloud Tiering Transparently Extends Enterprise File Storage to the Cloud

      Enterprises today are increasingly trying to move core file workloads to the cloud. Since file data can be voluminous, involving billions of files, migrating file data to the cloud can take months and create disruption. 

      A simple solution to this is to gradually offload files to the cloud without changing the end user experience. Cloud tiering and cloud archiving enable this by moving infrequently used cold data to a cheaper cloud storage tier, while the data continues to remain accessible from the original location. This enables users to transparently extend on-premises capacity with the cloud.

      Cloud Tiering Can Yield Significant Savings If Done Correctly

      Cloud object storage is cost-efficient if used correctly. Most cloud providers charge not only for the storage, but also to retrieve data, and they charge egress fees if the data has to leave the cloud. Cloud retrieval fees are usually in the form of charges for “get” and “put” API calls and cloud egress costs are charged by the amount of data that is read from anywhere outside the cloud. So, to keep enterprise storage costs low, infrequently accessed data such as snapshots, logs, backups and cold data are best suited for tiering to the cloud.
      By tiering cold data, the on-premises storage array needs to only keep hot data and the most recent logs and snapshots. Across Komprise customers, we have found that typically 60% to 80% of their actual data has not been accessed in over a year. By tiering the cold data as well as older log files and snapshots, the capacity of the storage array, mirrored storage array (if mirroring/replication is being used) and backup storage is reduced dramatically. This is why tiering cold data can reduce the overall storage cost by as much as 70% to 80%.

      The many advantages of cloud tiering of cold data include:

      • Reduced storage acquisition costs. Flash storage, used for fast access to hot data, is expensive. By tiering off infrequently used data you can purchase a much smaller amount of flash storage, thereby reducing acquisition costs.
      • Cut backup footprint and costs. By continuously tiering off cold data that is not being accessed you can reduce your backup footprint, backup license costs, and backup storage costs if the cold data is placed in robust storage (such as that provided by the major CSPs).
      • Increase disaster recovery speeds and lower disaster recovery (DR) costs. As with backup, by tiering off the cold data, the amount of data mirrored/replicated is dramatically reduced as well.
      • Improved storage performance. By running storage at a lower capacity and by removing access to cold data to another storage device or service, you can increase the performance of your storage array.
      • Leverage the cloud to run AI, ML, compliance checks and other applications on cold data. With cold data in the cloud, you can access, search and process your cold data without putting any load on your storage array. The cold data that is tiered off has value. Being able to process and feed your cold data into your AI/ML/BI engines is critical to staying competitive. By tiering you can extract value from your cold data without burdening your storage array. This also helps to extend the life of your storage array.

      Clearly, if cloud tiering is implemented correctly at the file level it will provide all of the above benefits whereas block tiering to the cloud will not.

      To learn more about the differences between cloud tiering at the file level vs the block level, and why so-called cloud pools are not the right approach for cloud storage tiering, read “What you need to know before jumping into the cloud tiering pool”. Also download the white paper: Cloud Tiering: Storage-Based vs Gateways vs. File-Based.

      Getting Started with Komprise:

    • Cold Data Storage

      Iceberg representing data storage with hot data above water and cold data below.

      Cold data refers to data that is infrequently accessed, as compared to hot data that is frequently accessed. As unstructured data grows at unprecedented rates, organizations are realizing the advantages of utilizing cold data storage devices instead of high-performance primary storage as they are much more economical, simple to set up & use, and are less prone to suffering from drive failure.

      For many organizations, the real difficulty with cold data is figuring out when data should be considered hot and kept on primary storage or it can be labeled as cold and moved off to a secondary storage device. For this reason, it’s important to understand the difference between data types to develop a solution for managing cold data that is most cost effective for your organization.

      Types of Data That Cold Storage is Typically Used For

      Examples of data types for which cold storage may be suitable include information a business is required to keep for regulatory compliance, video, photographs, and data that is saved for backup, archival, big-data analytics or disaster recovery purposes. As this data ages and is less frequently accessed, it can generally be moved to cold storage. A policy-based approach allows organizations to optimize storage resources and reduce costs by moving inactive data to more economical cold data storage.

      Advantages of Developing a Cold Data Storage Solution

      1. Prevent primary storage solutions from becoming overburdened with unused data
      2. Reduce overall resource costs of data storage
      3. Simplify data storage solution and optimize the management of its data
      4. Efficiently meet governance and compliance requirements
      5. Make use of more affordable & reliable mechanical storage drives for lesser used data

      Reduce Strain on Primary Storage by Moving Cold Data to Secondary Storage

      Affordable Costs of Cold Storage

      When comparing costs for enterprise-level storage drives, the mechanical drives used in many cold data storage systems are just over 20% of the price that high-end solid-state drives (SSD) can cost on average. For SSD’s at the top tier of performance, storage still costs close to 10 centers per gigabyte whereas NAS-level mechanical drives cost only around 2 centers per gigabyte on average.

      Simplify Your Data Storage Solution

      A well-optimized cold data storage system can make your local storage infrastructure much less cluttered & easier to maintain. As the storage tools which help us automatically determine which data is hot and cold continue to improve, managing the movement of data between solutions or tiers is becoming easier every year. Some cold data storage solutions are even starting to automate the entirety of the management process based on rules that the business establishes.

      Meet Regulatory or Compliance Requirements

      Many organizations in the healthcare industry are required to hold onto their data for extended periods of time, if not forever. With the possibility of facing litigation somewhere down the line based on having this data intact, corporations are opting to use a cold data storage solution which can effectively store critically important, unused data under conditions in which it cannot be tampered with or altered.

      Increase Data Durability with Cold Data Storage

      Reliability is one of the most important factors when choosing a data storage solution to house data for extended periods of time or indefinitely. Mechanical drives can be somewhat slower than SSD’s in providing file access, but they are still quick to be able to pull files and offer much more budget room for creating additional backup or parity within your storage system.

      When considering storage hardware for cold data solutions, consider low cost, high-capacity options with a high degree of data durability so your data can remain intact for as long as it needs to be stored for.

      Getting Started with Komprise:

  • D
    • Data Analytics

      Data analytics refers to the process used to enhance productivity and business improvement by extracting and categorizing data to identify and analyze behavioral patterns. Techniques vary according to organizational requirements.

      The primary goal of data analytics is to help organizations make more informed business decisions by enabling analytics professionals to evaluate large volumes of transactional and other forms of data. Data analytics can be pulled from anything from Web server logs to social media comments.

      Potential issues with data analytics initiatives include a lack of analytics professionals and the cost of hiring qualified candidates. The amount of information that can be involved and the variety of data analytics data can also cause data analytics issues, including the quality and consistency of the data. In addition, integrating technologies and data warehouses can be a challenge, although various vendors offer data integration tools with big data capabilities.

      Big data has drastically changed the requirements for extracting data analytics from business data. With relational databases, administrators can easily generate reports for business use, but they lack the broader intelligence data warehouses can provide. However, the challenge for data analytics from data warehouses is the costs associated.

      There is also the challenge of pulling the relevant data sets to enable data analytics from cold data. This requires intelligent data management solutions that track what unstructured data is kept and where, and enable you to easily search and find relevant data sets for big-data analytics.

      Getting Started with Komprise:

    • Data Archiving

      Data Archiving, often referred to as Data Tiering, protects older data that is not needed for everyday operations of an organization that is no longer needed for everyday access. Data Archiving reduces primary storage required, and allows an organization to maintain data that may be required for regulatory or other requirements.

      Data archiving is intended to protect older information that is not needed for everyday operations but may have to be accessed occasionally. Data archives serve as a way of reducing primary storage and the related costs, rather than acting as a data recovery tool.

      Some data archives allow data to be read-only to protect it from modification, while other data archiving products treat data as to allow users to modify it.

      The benefit of data archiving is that it reduces the cost of primary storage. Alternatively, archive storage costs less because it is typically based on a low-performance, high-capacity storage medium.

      Data archiving take a number of different forms. Options can be online data storage, which places archive data onto disk systems where it is readily accessible. Archives are frequently file-based, but object storage is also growing in popularity. A key challenge when using object storage to archive file-based data is the impact it can have on users and applications. To avoid changing paradigms from file to object and breaking user and application access, use data management solutions that provide a file interface to data that is archived as objects.

      Another archival system uses offline data storage where archive data is written to tape or other removable media using data archiving software rather than being kept online. Data archiving on tape consumes less power than disk systems, translating to lower costs.

      A third option is using cloud storage, such as those offered by Amazon – this is inexpensive but requires ongoing investment.

      The data archiving process typically uses automated software, which will automatically move “cold” data via policies set by an administrator. Today, a popular approach to data archiving is to make the archive “transparent” – so the archived data is not only online but the archived data is fully accessed exactly as before by users and applications, so they experience no change in behavior. (See Native Access)

      Getting Started with Komprise:

    • Data Backup

      Data loss can occur from a variety of causes, including computer viruses, hardware failure, file corruption, fire, flood, or theft, etc. Data loss may involve critical financial, customer, and company data, so a solid data backup plan is critical for every organization.

      As part of a data backup plan, consider the following:

      • What data (files and folders) to backup
      • How often to run your backups
      • Where to store the backup data
      • What compression method to use
      • What type of backups to run
      • What kind of media on which to store the backups

      In general, you should back up any data that can’t be replaced easily. Some examples are structured data like databases, and unstructured data such as word processing documents, spreadsheets, photos, videos, emails, etc. Typically, programs or system folders are not part of a data backup program. Installation discs, operating system discs, and registration information should be stored in a safe place.

      Data backup frequency depends on how often your organizational data changes.

      • Frequently changing data may need daily or hourly backups
      • Data that changes every few days might require a weekly or even monthly backup
      • For some data, a backup may need to be created each time it changes

      Cold data doesn’t need: • Expensive, high-octane storage • Repeated backup and replication

      The challenge with unstructured data is that backing up unstructured data is not only time consuming but also very complex, with millions to billions of files of various sizes and types and growing at an astronomical rate, leaving enterprises to struggle with long backup windows, overlapping backup cycles, backup footprint sprawl, spiraling costs, and above all, vulnerable in the case of a disaster.

      Read the white paper: Rein in Storage and Backup Costs.

      Read the post: 5 Ways to Get to the Cloud Smarter and Faster

      Don’t backup data first. Know your data first to make smarter, cost-saving decisions. Start with the Komprise TCO calculator.

      Getting Started with Komprise:

    • Data Classification

      Data classification is the process of organizing data into tiers of information for data organizational purposes.

      Data classification is essential to make data easy to find and retrieve so that your organization can optimize risk management, compliance, and legal requirements. Written guidelines are essential in order to define the categories and criteria to classify your organization’s data. It is also important to define the roles and responsibilities of employees in the data organization structure.

      When data classification procedures are established, security standards should also be established to address data life-cycle requirements. Classification should be simple so employees can easily comply with the standard.

      Examples of types of data classifications are:

      • 1st Classification: Data that is free to share with the public
      • 2nd Classification: Internal data not intended for the public
      • 3rd Classification: Sensitive internal data that would negatively impact the organization if disclosed
      • 4th Classification: Highly sensitive data that could put an organization at risk

      Data classification is a complex process, but automated systems can help streamline this process. The enterprise must create the criteria for classification, outline the roles and responsibilities of employees to maintain the protocols, and implement proper security standards. Properly executed, data classification will provide a framework for the storage, transmission and retrieval of data.

      Automation simplifies data classification by enabling you to dynamically set different filters and classification criteria when viewing data across your storage. For instance, if you wanted to classify all data belonging to users who are no longer at the company as “zombie data,” the Komprise solution will aggregate files that fit into the zombie data criterion to help you quickly classify your data.

      Getting Started with Komprise:

    • Data Governance

      Data governance refers to the management of the availability, security, usability, and integrity of data used in an enterprise. Data governance in an organization typically includes a governing council, a defined set of procedures, and a plan to execute those procedures.

      Data governance is not about allowing access to a few privileged users; instead, it should allow broad groups of users access with appropriate controls. Business and IT users have different needs; business users need secure access to shared data and IT needs to set policies around security and business practices. When done right, data governance allows any user access to data anytime, so the organization can run more efficiently, and users can manage their workload in a self-service manner.

      Here are four things to consider when developing a data governance strategy:

      Selecting a Team:

      • Balance IT and business leaders to get a broad view of the data and service needs
      • Start small – choose a small group to review existing data analytics

      Data Quality:

      • Audit existing data to discover data types and how they are used
      • Define a process for new data sources to ensure quality and availability standards are met

      Data Security:

      • Make sure data is classified so data requiring protection for legal or regulatory reasons meets those requirements
      • Implement policies that allow for different levels of access based on user privileges

      Getting Started with Komprise:

    • Data Lake

      A data lake is data stored in its natural state. The term typically refers to unstructured data that is sitting on different storage environments and clouds. The data lake supports data of all types – for example, you may have videos, blogs, log files, seismic files and genomics data in a single data lake. You can think of each of your Network Attached Storage (NAS) devices as a data lake.

      One big challenge with data lakes is to comb through them and find the relevant data you need. With unstructured data, you may have billions of files strewn across different data lakes, and finding data that fits specific criteria can be like finding a needle in a haystack

      A virtual data lake is a collection of data that fits certain criteria – and as the name implies, it is virtual because the data is not moved. The data continues to reside in its original location, but the virtual data lake gives a discrete handle to manipulate that entire data set.

      Some key aspects of data lakes – both physical and virtual:

      • Data Lakes Support a Variety of Data Formats: Data lakes are not restricted to data of any particular type.
      • Data Lakes Retain All Data: Even if you do a search and find some data that does not fit your criteria, the data is not deleted from the data lake. A virtual data lake provides a discrete handle to the subset of data across different storage silos that fits specific criteria, but nothing is moved or deleted.
      • Virtual Data Lakes Do Not Physically Move Data: Virtual data lakes do not physically move the data, but provide a virtual aggregation of all data that fits certain criteria. Deep Analytics can be used to specify criteria.

      Getting Started with Komprise:

    • Data Literacy

      The ability to derive meaningful information from data. Komprise Data Analytics provides data literacy by showing how much data, what kind, who’s using it, how often—across all storage silos. Read the IDC InfoBrief: How to Manage Your Data Growth Smarter with Data Literacy.

      Getting Started with Komprise:

    • Data Management

      Data management is officially defined by DAMA International, the professional organization data management professionals, is:

      “Data Resource Management is the development and execution of architectures, policies, practices and procedures that properly manage the full data lifecycle needs of an enterprise.”

      Data management is the process of developing policies and procedures in order to effectively manage the information lifecycle needs of an enterprise. This includes identifying how data is acquired, validated, stored, protected, and processed. Data management policies should cover the entire lifecycle of the data, from creation to deletion.

      Due to the sheer volume of unstructured data, a data management plan is necessary for every organization. The numbers are staggering – for example, more data has been created in the past two years than in the entire previous history of the human race.

      Getting Started with Komprise:

    • Data Management Policy

      A data management policy addresses the operating policy that focuses on the management and governance of data assets, and is a cornerstone of governing enterprise data assets. This policy should be managed by a team within the organization that identifies how the policy is accessed and used, who enforces the data management policy, and how it is communicated to employees.

      It is recommended that an effective data management policy team include top executives to lead in order for governance and accountability to be enforced. In many organizations, the Chief Information Officer (CIO) and other senior management can demonstrate their understanding of the importance of data management by either authoring or supporting directives that will be used to govern and enforce data standards.

      The following are some of the considerations to consider in a data management policy:

      • Enterprise data is not owned by any individual or business unit, but is owned by the enterprise
      • Enterprise data must be safe and
      • Enterprise Data Must Be Accessible to individuals within the organization
      • Meta data should be developed and utilized for all structured and unstructured data
      • Data owners should be accountable for enterprise data
      • Users should not have to worry about where data lives. Data should be accessible to users no matter where it resides.

      Ultimately, a data management policy should guide your organization’s philosophy toward managing data as a valued enterprise asset. Watch the video: Intelligent Data Management: Policy-Based Automation

      Getting Started with Komprise:

    • Data Migration

      Data Migration is the process of selecting and moving data from one location to another – this may involve moving data across different storage vendors, and across different formats.

      Data migrations are often done in the context of retiring a system and moving to a new system, or in the context of a cloud migration, or in the context of a modernization or upgrade strategy.

      Data migrations can be laborious, error prone, manual, and time consuming. Migrating data may involve finding and moving billions of files, which can succumb to storage and network slowdowns or outages. Also, different file systems do not often preserve metadata in exactly the same way, so migrating data without loss of fidelity and integrity can be a challenge.

      Network Attached Storage (NAS) migration is the process of migrating from one NAS storage environment to another. This may involve migrations within a vendor’s ecosystem such as NetApp to NetApp or across vendors such as NetApp to Isilon or EMC to NetApp or EMC to Pure FlashBlade. A high-fidelity NAS migration solution should preserve not only the file itself but all of its associated metadata and access controls.

      Network Attached Storage (NAS) to Cloud data migration is the process of moving data from an on-premises data center to a cloud.  It requires data to be moved from a file format (NFS or SMB) to an Object/Cloud format such as S3. A high-fidelity NAS-to-Cloud migration solution preserves all the file metadata including access control and privileges in the cloud.  This enables data to be used either as objects or as files in the cloud.

      Storage migration is a general-purpose term that applies to moving data across storage arrays.

      Data migrations typically involve four phases:

      • Planning – Deciding what data should be migrated. Planning may often involve analyzing various sources to find the right data sets. For example, several customers today are interested in upgrading some data to Flash – finding hot, active data to migrate to Flash can be a useful planning exercise.
      • Initial Migration – Do a first migration of all the data. This should involve migrating the files, the directories and the shares.
      • Iterative Migrations – Look for any changes that may have occurred during the initial migration and copy those over.
      • Final Cutoff – A final cutoff involves deleting data at the original storage and managing the mounts, etc., so data can be accessed from the new location going forward.

      Resilient data migration refers to an approach that automatically adjusts for failures and slowdowns and retries as needed. It also checks the integrity of the data at the destination to ensure full fidelity.

      Getting Started with Komprise:

    • Data Protection

      Data protection is used to describe both data backup and disaster recovery. A quality data protection strategy should automate the movement of critical data to online and offline storage and include a comprehensive strategy for valuing, classifying, and protecting data as to protect these assets from user errors, malware and viruses, machine failure, or facility outages/disruptions.

      Data protection storage technologies include tape backup, which copies data to a physical tape cartridge, or cloud backup, which copies data to the cloud, and mirroring, which replicates a website or files to a secondary location. These processes can be automated and policies assigned to the data, allowing for accurate, faster data recovery.

      Data protection should always be applied to all forms of data within an organization, in order to protect the integrity of the data, protect from corruption or errors, and ensuring privacy of the data. When classifying data, policies should be established to identify different levels of security, from least secure (data that anyone can see) to most secure (data that if released, would put the organization at risk).

      Getting Started with Komprise:

    • Data Sprawl

      Data sprawl describes the staggering amount of data produced by enterprises worldwide every day; with new devices, including enterprise and mobile applications added to a network, it is estimated data sprawl to be 40% year over year, into the next decade.

      Given this growth in data sprawl, data security is imperative, as it can lead to enormous problems for organizations, as well as its employees and customers. In today’s fast-paced world, organizations must carefully consider how to best manage the precious information it holds.

      Organizations experiencing data sprawl need to secure all of their endpoints. Security is critical. Addressing data security as well as remote physical devices ensure organizations are in compliance with internal and external regulations.

      As the amount of security threats mount, it is critical that data sprawl is addressed. Taking the right steps to ensure data sprawl is controlled, via policies and procedures within an organization, means safeguarding not only internal data, but also critical customer data.

      Organizations should develop solid practices that may have been dismissed in the past. Left unchecked, control of an organization’s unstructured data will continue to manifest itself in hidden costs and limited options. With a little evaluation and planning, it is an aspect of your network that can be improved significantly and will pay off long term.

      Getting Started with Komprise:

    • Data Tiering

      Data Tiering refers to a technique of moving less frequently used data, also known as cold data, to cheaper levels of storage or tiers. The term “data tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems.

      Komprise data tiering image

      Data Tiering Cuts Costs Because 70%+ of Data is Cold
      As data grows, storage costs are escalating. It is easy to think the solution is more efficient storage. But the real cause of storage costs is poor data management. Over 70% of data is cold and has not been accessed in months, yet it sits on expensive storage and consumes the same backup resources as hot data. As a result, storage costs are rising, backups are slow, recovery is unreliable, and the sheer bulk of this data makes it difficult to leverage new options like Flash and Cloud.

      Data Tiering Was Initially Used within a Storage Array
      Data Tiering was initially a technique used by storage systems to reduce the cost of data storage by tiering cold data within the storage array to cheaper but less performant options – for example, moving data that has not been touched in a year or more from an expensive Flash tier to a low-cost SATA disk tier.

      Typical storage tiers within a storage array include:

      • Flash or SSD: A high-performance storage class but also very expensive. Flash is usually used on smaller data sets that are being actively used and require the highest performance.
      • SAS Disks: Usually the workhorse of a storage system, they are moderately good at performance but more expensive than SATA disks.
      • SATA Disks: Usually the lowest price-point for disks but not as performant as SAS disks.
      • Secondary Storage, often Object Storage: Usually a good choice for capacity storage – to store large volumes of cool data that is not as frequently accessed, at a much lower cost.

      Increasingly, customers are looking at another option – tiering or archiving data to a public cloud.

      • Public Cloud Storage: Public clouds currently have a mix of object and file storage options. The object storage classes such as Amazon S3 and Azure Blob provide tremendous cost efficiency and all the benefits of object storage without the headaches of setup and management.

      Cloud Data Tiering is now Popular
      Tiering and archiving less frequently used data or cold data to public cloud storage classes is now more popular. This is because customers can leverage the lower cost storage classes within the cloud to keep the cold data and promote them to the higher cost storage classes when needed. For example, data can be archived or tiered from on-premises NAS to Amazon S3 Infrequent Access or Amazon Glacier for low ongoing costs, and then promoted to Amazon EFS or FSX when you want to operate on it and need performance.

      But in order to get this level of flexibility, and to ensure you’re not treating the cloud as just a cheap storage locker, data that is tiered to the cloud needs to be accessible natively in the cloud without requiring third-party software. This requires file-tiering, not block-tiering.

      Block Tiering Creates Unnecessary Costs and Lock-In
      Block-level tiering was first introduced as a technique within a storage array to make the storage box more efficient by leveraging a mix of technologies such as more expensive SAS disks as well as cheaper SATA disks.

      Block tiering breaks a file into various blocks – metadata blocks that contain information about the file, and data blocks that are chunks of the original file. Block-tiering or Block-level tiering moves less used cold blocks to lower, less expensive tiers, while hot blocks and metadata are typically retained in the higher, faster, and more expensive storage tiers.

      Block tiering is a technique used within the storage operating system or filesystem and is proprietary. Storage vendors offer block tiering as a way to reduce the cost of their storage environment. Many storage vendors are now expanding block tiering to move data to the public cloud or on-premises object storage.
      But, since block tiering is done inside the storage operating system as a proprietary solution, it has several limitations when it comes to efficiency of reuse and efficiency of storage savings. Firstly, with block tiering, the proprietary storage filesystem must be involved in all data access since it retains the metadata and has the “map” to putting the file together from the various blocks. This also means that the cold blocks that are moved to a lower tier or the cloud cannot be directly accessed from the new location without involving the proprietary filesystem because the cloud does not have the metadata map and the other data blocks and the file context and attributes to put the file together. So, block tiering is a proprietary approach that often results in unnecessary rehydration of the data and treats the cloud as a cheap storage locker rather than as a powerful way to use data when needed.

      The only way to access data in the cloud is to run the proprietary storage filesystem in the cloud which adds to costs. Also, many third-party applications such as backup software that operate at a file level require the cold blocks to be brought back or rehydrated, which defeats the purpose of tiering to a lower cost storage and erodes the potential savings. For more details, read the white paper: Block vs. File-Level Tiering and Archiving.

      File Tiering Maximizes Savings and Eliminates Lock-In
      File-tiering is an advanced modern technology that uses standard protocols to move the entire file along with its metadata in a non-proprietary fashion to the secondary tier or cloud. File tiering is harder to build but better for customers because it eliminates vendor lock-in and maximizes savings. Whether files have POSIX-based Access Control Lists (ACLs) or NTFS extended attributes, all this metadata along with the file itself is fully tiered or archived to the secondary tier and stored in a non-proprietary format. This ensures that the entire data can be brought back as a file when needed. File tiering does not just move the file, but it also moves the attributes and security permissions and ACLS along with the file and maintains full file fidelity even when you are moving a file to a different storage architecture such as object storage or cloud. This ensures that applications and users can use the moved file from the original location, and they can directly open the file natively in the secondary location or cloud without requiring any third-party software or storage operating system.

      Since file tiering maintains full file fidelity and native access based on standards at every tier, it also means that third party applications can access the moved data without requiring any agents or proprietary software. This ensures that savings are maximized since backup software and other third -arty applications can access moved data without rehydrating or bringing the file back to the original location. It also ensures that the cloud can be used to run valuable applications such as compliance search or big data analytics on the trove of tiered and archived data without requiring any third-party software or additional costs.

      File-tiering is an advanced technique for archiving and tiering that maximizes savings and breaks vendor lock-in.

      Data Tiering Can Cut 70%+ Storage and Backup Costs When Done Right
      In summary, data tiering is an efficient solution to cut storage and backup costs because it tiers or archives cold, unused files to a lower-cost storage class, either on-premises or in the cloud. However, to maximize the savings, data tiering needs to be done at the file level, not block level. Block-level tiering creates lock-in and erodes much of the cost savings because it requires unnecessary rehydration of the data. File tiering maximizes savings and preserves flexibility by enabling data to be used directly in the cloud without lock-in.

      Getting Started with Komprise:

    • Data Virtualization

      Data virtualization delivers a unified, simplified view of an organization’s data that can be accessed anytime. It integrates data from multiple sources, to create a single data layer to support multiple layers and users. The result is faster access to this data, providing instant access, any way you want it.

      Data virtualization involves abstracting, transforming, federating and delivering data from disparate sources. This allows users to access the applications without having to know their exact location.

      There are some important advantages to data virtualization:

      • An organization can gain business insights by leveraging all data 
      • They can become aware of analytics and business intelligence
      • Data virtualization can streamline an organization’s data management approach, which reduces complexity and saves money

      Data virtualization involves three key steps. First, data virtualization software is installed on-premise or in the cloud, which collects data from production sources and stays synchronized as those sources change over time. Next, administrators are able to secure, archive, replicate, and transform data using the data virtualization platform as a single point of control. Last, it allows users to provision virtual copies of the data that consume significantly less storage than physical copies.

      Some use cases for data virtualization are:

      • Application development
      • Backup and disaster recovery
      • Datacenter migration
      • Test data management
      • Packaged application projects

      Getting Started with Komprise:

    • Deep Analytics

      Deep analytics is the process of applying data mining and data processing techniques to analyze and find large amounts of data in a form that is useful and beneficial for new applications. Deep analytics can apply to both structured and unstructured data.

      In the context of unstructured data, deep analytics is the process of examining file metadata (both standard and extended) across billions of files to find data that fits specific criteria. A petabyte of unstructured data can be a few billion files. Analyzing petabytes of data typically involves analyzing tens to hundreds of billions of files. Because analysis of such large workloads can require distribution over a farm of processing units, deep analytics is often associated with scale-out distributed computing, cloud computing, distributed search, and metadata analytics.

      Deep analytics of unstructured file data requires efficient indexing and search of files and objects across a distributed farm. Financial services, genomics, research and exploration, biomedical, and pharmaceutical are some of the early adopters of deep analytics. In recent years, enterprises have started to show interest in deep analytics as the amount of corporate data has increased, and with it, the desire to extract value from the data.

      Deep analytics enables additional use cases such as Big Data Analytics, Artificial Intelligence and Machine Learning.

      When the result of a deep analytics query is a virtual data lake, data does not have to be moved or disrupted from its original destination to enable reuse. This is an ideal scenario to rapidly leverage deep analytics without disruption since data can be pretty heavy to move.

      Getting Started with Komprise:

    • Digital Business

      A digital business is one that uses technology as an advantage in its internal and external operations.

      Information technology has changed the infrastructure and operation of businesses from the time the Internet became widely available to businesses and individuals. This transformation has profoundly changed the way businesses conduct their day-to-day operations. This has maximized the benefits of data assets and technology-focused initiatives.

      This digital transformation has had a profound impact on businesses; accelerating business activities and processes to fully leverage opportunities in a strategic way. A digital business takes advantage of this fully so to not be disrupted and to thrive in this era. C-Level staff needs to help their organizations seize opportunities while mitigating risks.

      This technology mindset has become standard in even the most traditional of industries, making a digital business strategy imperative for storing and analyzing data to gain a competitive advantage over the competition. The introduction of cloud computing and SaaS delivery models means that internal processes can be easily managed through a wide choice of applications, giving organizations the flexibility to chose, and change software as the businesses grows and changes.

      A digital business also has seen a shift in purchasing power; individual departments now push for the applications that will best suit their needs, rather than relying on IT to drive change.

      Getting Started with Komprise:

    • Direct Data Access

      The ability to directly access your data whether on-premises, in the cloud, or a hybrid environment without needing to rehydrate

      Getting Started with Komprise:

    • Director (Komprise Director)

      The administrative console of the Komprise distributed architecture that runs as a cloud service or on-premises. Read the white paper: Komprise Intelligent Data Management Architecture Overview or one of the Komprise Chalk-Talk videos to learn more.

      Getting Started with Komprise:

    • Disaster Recovery

      Disaster recovery refers to security planning to protect an organization from the effects of a disaster – such as a cyber attack or equipment failure. A properly constructed disaster recovery plan will allow an organization to maintain or quickly resume mission critical functions following a disaster.

      The disaster recovery plan includes policies and testing, and may involve a separate physical site for restoring operations. This preparation needs to be taken very seriously, and will involve a significant investment of time and money to ensure minimal losses in the event of a disaster.

      Control measures are steps that can reduce or eliminate various threats for organizations. Different types of measures can be included in disaster recovery plan. There are three types of disaster recovery control measures that should be considered:

      1. Preventive measures – Intended to prevent a disaster from occurring
      2. Detective measures – Intended to detect unwanted events
      3. Corrective measures – The plan to restore systems after a disaster has occurred.

      A quality disaster recovery plan requires these policies be documented and tested regularly. In some cases, organizations outsource disaster recovery to an outsourced provider instead of using their own remote facility, which can save time and money. This solution has become increasingly more popular with the rise in cloud computing.

      Getting Started with Komprise:

    • Dynamic Data Analytics

      The Komprise feature that allows organizations to analyze data across all storage to know how much exists, what kind, who’s using it, and how fast it’s growing. “What if” data scenarios can be run based on various policies to instantly see capacity and cost savings, enabling informed, optimal data management planning decisions without risk.

      Getting Started with Komprise:

  • E
    • Egress Costs

      The large network fees most cloud providers charge to move your data out of the cloud. Most allow you to move your data into the cloud for free (ingress).

      Getting Started with Komprise:

    • Elastic Data Migration

      What is Elastic Data Migration?

      Data migration is the process of moving data (eg files, objects) from one storage environment to another, but Elastic Data Migration is a high-performance migration solution from Komprise using a parallelized, multi-processing, multi-threaded approach that speeds NAS-to-NAS and NAS-to-cloud migrations in a fraction of the traditional time and cost.

      Standard Data Migration:

      • NAS Data Migration – move files from a Network Attached Storage (NAS) to another NAS. The NAS environments may be on-premises or in the cloud (Cloud NAS)
      • S3 Data Migration – move objects from an object storage or cloud to another object storage or cloud

      Data migrations can occur over a local network (LAN) or when going to the cloud over the internet (WAN). As a result, migrations can be impacted by network latencies and network outages.

      Data migration software needs to address these issues to make data migrations efficient, reliable, and simple, especially when dealing with NAS and S3 data since these data sizes can be in petabytes and involve billions of files.

      Elastic Data Migration:

      Elastic Data Migration makes its orders of magnitude faster than normal data migrations. It leverages parallelism at multiple levels to deliver 27 times faster performance than alternatives:

      • Parallelism of the Komprise scale-out architecture – Komprise distributes the data migration work across multiple Komprise Observer VMs so they run in parallel.
      • Parallelism of sources – When migrating multiple shares, Komprise breaks them up across multiple Observers to leverage the inherent parallelism of the sources
      • Parallelism of data set – Komprise optimizes for all the inherent parallelism available in the data set across multiple directories, folders, etc to speed up data migrations
      • Big files vs small files – Komprise analyzes the data set before migrating it so it learns from the nature of the data – if the data set has a lot of small files, Komprise adjusts its migration approach to reduce the overhead of moving small files. This AI driven approach delivers greater speeds without human intervention.
      • Protocol level optimizations – Komprise optimizes data at the protocol level (eg NFS, SMB) so the chattiness of the protocol can be minimized

      All of these improvements deliver substantially higher performance than standard data migration. When an enterprise is looking to migrate large production data sets quickly, without errors, and without disruption to user productivity, Komprise Elastic Data Migration delivers a fast, reliable, and cost-efficient migration solution.

      Komprise Elastic Data Migration Architecture

      Komprise Elastic Data Migration Architecture

      What Elastic Data Migration for NAS and Cloud provides:

      Komprise Elastic Data Migration provides high-performance data migration at scale, solving critical issues that IT professionals face with these migrations. Komprise makes it possible to easily run, monitor, and manage hundreds of migrations simultaneously. Unlike most other migration utilities, Komprise also provides analytics along with migration to provide insight into the data being migrated, which allows for better migration planning.

      Fast, painless data migrations with parallelized, optimized data migration:

      • Parallelism at every level:
        • Leverages parallelism of storage, data hierarchy and files
        • High performance multi-threading and automatic division of a migration task across machines
      • Network efficient: Adjusts for high-latency networks by reducing round trips
      • Protocol efficient: optimized NFS handling to eliminate unnecessary protocol chatter
      • High Fidelity: Does MD5 checksums of each file to ensure full integrity of data transfer
      • Intuitive Dashboards and API: Manage hundreds of migrations seamlessly with intuitive UI and API
      • Greater speed and reliability
      • Analytics with migration for data insights
      • Ongoing value

      Download the free Elastic Data Migration white paper for more details.

      Getting Started with Komprise:

  • F
    • File-level Tiering

      A standards-based tiering approach Komprise uses that moves each file with all its metadata to the new tier, maintaining full file fidelity and attributes at each tier for direct data access from the target storage and no rehydration. Read the white paper: Block-Level Tiering versus File-Level Tiering.

      Block vs. File Tiering

      Getting Started with Komprise:

    • File Server

      The central server in a computer network that provides a central storage place for files on internal data media to connected clients.

      Getting Started with Komprise:

    • Flash Storage

      Flash storage is storage media intended to electronically secure data, which can be electronically erased and reprogrammed. The other advantage is it responds faster than a traditional disc, increasing performance.

      With the increasing volume of stored data from the growth of mobility and Internet of Things (IoT), organizations are challenged with both storing data and the opportunities it brings. Disk drives can be too slow, due to the speed limitations. For stored data to have real value, businesses must be able to quickly access and process that data to extract actionable information.

      Flash storage has a number of advantages over alternative storage technologies.

      • Greater performance. This leads to agility, innovation, and improved experience for the users accessing the data – delivering real insight to an organization
      • Reliability. With no moving parts, Flash has higher uptime due to no moving parts. A well-built all-flash array can last between 7-10 years.

      While Flash storage can offer a great improvement for organizations, it is still too expensive as a place to store all data. Flash storage has been about twenty times more expensive per gigabyte than spinning disk storage over the past seven years. Many enterprises are looking at a tiered model with high-performance flash for hot data and cheap, deep object or cloud storage for cold data.

      Getting Started with Komprise:

  • G
    • General Data Protection Regulation (GDPR)

      The General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679) is a regulation by the European Union that aims to strengthen and unify data protection for all individuals within the European Union (EU). It also addresses the export of personal data outside the EU.

      GDPR becomes enforceable from 25 May 2018. Businesses transacting with countries in the EU will have to comply with GDPR laws.

      The GDPR regulation applies to personal data collected by organizations including cloud providers and businesses.

      Article 17 of GDPR is often called the “Right to be Forgotten” or “Right to Erasure”. The full text of the article is found below.

      To comply with GDPR, you need to use an intelligent data management solution to identify data belonging to a particular user and confine it outside the visible namespace before deleting the data. This two-step deletion ensures there are no dangling references to the data from users and applications and enables an orderly deletion of data.

       

      Art. 17 GDPR Right to erasure (‘right to be forgotten’)

      1) The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay where one of the following grounds applies:

      1. the personal data are no longer necessary in relation to the purposes for which they were collected or otherwise processed; 2 the data subject withdraws consent on which the processing is based according to point (a) of Article 6(1), or point (a) of Article 9(2), and where there is no other legal ground for the processing;
      2. the data subject objects to the processing pursuant to Article 21(1) and there are no overriding legitimate grounds for the processing, or the data subject objects to the processing pursuant to Article 21(2);
        the personal data have been unlawfully processed;
      3. the personal data have to be erased for compliance with a legal obligation in Union or Member State law to which the controller is subject;
      4. the personal data have been collected in relation to the offer of information society services referred to in Article 8(1).

      2) Where the controller has made the personal data public and is obliged pursuant to paragraph 1 to erase the personal data, the controller, taking account of available technology and the cost of implementation, shall take reasonable steps, including technical measures, to inform controllers which are processing the personal data that the data subject has requested the erasure by such controllers of any links to, or copy or replication of, those personal data.

      3) Paragraphs 1 and 2 shall not apply to the extent that processing is necessary:

      1. for exercising the right of freedom of expression and information;
      2. for compliance with a legal obligation which requires processing by Union or Member State law to which the controller is subject or for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller;
      3. for reasons of public interest in the area of public health in accordance with points (h) and (i) of Article 9(2) as well as Article 9(3);
      4. for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing; or
      5. for the establishment, exercise or defense of legal claims.

      Getting Started with Komprise:

  • H
    • High Performance Storage

      High performance storage is a type of storage management system designed for moving large files and large amounts of data around a network. High performance storage is especially valuable for moving around large amounts of complex data or unstructured data like large video files across the network.

      Used with both direct-connected and network-attached storage, high performance storage supports data transfer rates greater than one gigabyte per second and is designed for enterprises handling large quantities of data – in the petabyte range.

      High performance storage supports a variety of methods for accessing and creating data, including FTP, parallel FTP, VFS (Linux), as well as a robust client API with support for parallel I/O.

      High performance storage is useful to manage hot or active data, but can be very expensive for cold/inactive data. Since over 60 to 90% of data in an organization is typically inactive/cold within months of creation, this data should be moved off high performance storage to get the best TCO of storage without sacrificing performance.

      Getting Started with Komprise:

    • Hosted Data Management

      With hosted data management, a service provider administers IT services, including infrastructure, hardware, operating systems, and system software, as well as the equipment used to support operations, including storage, hardware, servers, and networking components. 

      The service provider typically sets up and configures hardware, installs and configures software, provides support and software patches, maintenance, and monitoring.

      Services may also include disaster recovery, security, DDoS (distributed denial of service) mitigation, and more.

      Hosted data management may be provided on a dedicated or shared-service model. In dedicated hosting, the service provider sets aside servers and infrastructure for each client; in shared hosting, pooled resources and charged for on a per-use basis.

      Hosted data management can also be referred to as cloud services. With cloud hosting, resources are dispersed between and across multiple servers, so load spikes, downtime, and hardware dependencies are spread across multiple servers working together.

      In this arrangement, the client usually has administrative access through a Web-based interface.

      Another popular model is hybrid cloud hosted data management – where the administrative console resides in the cloud but all the data management (analyzing data, moving data, accessing data) is done on premise. Komprise uses this hybrid approach as it offers the best of both worlds – a fully managed service that reduces operating costs without compromising the security of data.

      Getting Started with Komprise:

    • Hot Data

      Business-critical data that needs to be accessed frequently and resides on primary storage (NAS).

      Getting Started with Komprise:

  • I
    • Intelligent Data Management

      Intelligent data management is the process of managing unstructured data throughout its lifecycle with analytics and intelligence.

      The criteria for a solution to be considered as Intelligent Data Management includes:

      • Analytics-Driven: Is the solution able to leverage analysis of the data to inform its behavior? Is it able to deliver analysis of the data to guide the data management planning and policies?
      • Storage-Agnostic: Is the data management solution able to work across different vendor and different storage platforms?
      • Adaptive: Based on the network, storage, usage, and other conditions, is the data management solution able to intelligently adapt its behavior? For instance, does it throttle back when the load gets higher, does it move bigger files first, does it recognize when metadata does not translate properly across environments, does it retry when the network fails?
      • Closed Loop: Analytics feeds the data management which in turn provides additional analytics. A closed loop system is a self-learning system that uses machine learning techniques to learn and adapt progressively in an environment.
      • Efficient: An intelligent data management solution should be able to scale out efficiently to handle the load, and to be resilient and fault tolerant to errors.

      Intelligent data management solutions typically address the following use cases:

      • Analysis: Find the what, who, when of how data is growing and being used
      • Planning: Understand the impact of different policies on costs, and on data footprint
      • Data Archiving: Support various forms of managing cold data and offloading it from primary storage and backups without impacting user access. Includes: Archive data by policy – move data with links for seamless access, Archive project data – archive data that belongs to a project as a collection, Archive without links – move data without leaving a link behind when data needs to be moved out of an environment
      • Data Replication: Create a copy of data on another location.
      • Data Migration: Move data from one storage environment to another
      • Deep Analytics: Search and query data at scale across storage

      Getting Started with Komprise:

  • M
    • Metadata

      Metadata means “data about data” or data that describes other data. The prefix “meta” typically means “an underlying definition or description” in technology circles

      Metadata makes finding and working with data easier – allowing the user to sort or locate specific documents. Some examples of basic metadata are author, date created, date modified, and file size. Metadata is also used for unstructured data such as images, video, web pages, spreadsheets, etc.

      Web pages often include metadata in the form of meta tags. Description and keywords meta tags are commonly used to describe content within a web page. Search engines can use this data to help understand the content within a page.

      Metadata can be created manually or through automation. Accuracy is increased using manual creation as it allows the user to input relevant information. Automated metadata creation can be more elementary, usually only displaying basic information such as file size, file extension, when the file was created, for example.

      Metadata can be stored and managed in a database, however, without context, it may be impossible to identify metadata just by looking at it. Metadata is useful in managing unstructured data since it provides a common framework to identify and classify a variety of data including videos, audios, genomics data, seismic data, user data, documents, logs.

      Getting Started with Komprise:

  • N
    • Native Access

      Having direct access to tiered or archived data without needing rehydration because files are accessed as objects from the target storage.

      Watch the TechKrunch session: How to Access Tiered Data in the Cloud for an example of how Komprise allows you to access your stored data wherever it’s stored, whenever you want, without rehydration. Because moved data are always intact, you can extract data value with both file and native access – and without penalty. Read the Komprise Architecture Overview for more information on Native Access.

      Getting Started with Komprise:

    • Native File Format

      or Native Data Format. The file structure in which a document is created and maintained by the original creating application

      Getting Started with Komprise:

    • Network File System (NFS)

      A network file system (NFS) is a mechanism that enables storage and retrieval of data from multiple hard drives and directories across a shared network, enabling local users to access remote data as if it was on the user’s own computer.

      The NFS protocol is one of several distributed file system standards for network-attached storage (NAS). It was originally developed in the 1980s by Sun Microsystems, and is now managed by the Internet Engineering Task Force (IETF).

      NFS is generally implemented in computing environments where centralized management of data and resources is critical. Network file system works on all IP-based networks. Depending on the version in use, TCP and UDP are used for data access and delivery.

      The NFS protocol is independent of the computer, operating system, network architecture, and transport protocol, which means systems using the NFS service may be manufactured by different vendors, use different operating systems, and be connected to networks with different architectures. These differences are transparent to the NFS application, and the user.

      Getting Started with Komprise:

    • Network Attached Storage (NAS)

      Diagram of network attached storage (NAS) system.

      What is Network Attached Storage?

      Network Attached Storage (NAS) system is a storage device connected to a network that allows storage and retrieval of data from a centralized location for authorized network users and heterogeneous clients. These devices generally consist of an engine that implements the file services (NAS device), and one or more devices on which data is stored (NAS drives).

      The purpose of a NAS system is to provide a local area network (LAN) with file-based, shared storage in the form of an appliance optimized for quick data storage and retrieval. NAS is a relatively expensive storage option, so it should only be used for hot data that is accessed the most frequently.

      NAS Storage Benefits

      Network attached storage devices are used to remove the responsibility of file serving from other servers on a network and allows for a convenient way to share files among multiple computers. Benefits of dedicated network attached storage include:

      • Faster data access
      • Easy to scale up and expand upon
      • Remote data accessibility
      • Easier administration
      • OS-agnostic compatibility (works with Windows and Apple-based devices)
      • Built-in data security with compatibility for redundant storage arrays
      • Simple configuration and management (typically does not require an IT pro to operate)

      NAS File Access Protocols

      Network attached storage devices are often capable of communicating in a number of different file access protocols, such as:

      • Network File System (NFS)
      • Server Message Block (SMB)
      • Apple Filing Protocol (AFP)
      • Common Internet File System (CIFS)

      Most NAS devices have a flexible range of data storage systems that they’re compatible with, but you should always ensure that your intended device will work with your specific data storage system.

      Enterprise NAS Storage Applications

      In an enterprise, a NAS array can be used as primary storage for storing unstructured data and as backup for archiving or disaster recovery. It can also function as an email, media database or print server for a small business. Higher-end NAS devices can hold enough disks to support RAID, a storage technology that allows multiple hard disks into one unit to provide better performance times, redundancy, and high availability.

      Data on NAS systems is often mirrored (replicated) to another NAS system, and backups or snapshots of the footprint are kept on the NAS for weeks or months. This leads to at least three or more copies of the data being kept on expensive NAS storage.  NAS storage does not need to be used for disaster recovery and backup copies as this can be very costly.   By finding and archiving cold data from NAS, you can eliminate the extra copies of cold data and cut cold data costs by over 70%. Check out our video on NAS storage savings to get a more detailed explanation of how this concept works in practice.

      Network Attached Storage (NAS) Data Tiering and Data Archiving

      Since NAS storage is typically designed for higher performance and can be expensive, data on NAS is often tiered and moved to less expensive storage classes.  NAS vendors offer some basic data tiering at the block-level to provide limited savings on storage costs, but not on backup and DR costs.  Unlike the proprietary block-level tiering, file-level tiering or archiving provides a standards-based, non-proprietary solution to maximize savings by moving cold data to cheaper storage solutions. This can be done transparently so users and applications do not see any difference when cold files are archived.  Read this white paper to learn more about the differences between file tiering and block tiering.

      Network Attached Storage FAQ

      These are some of the most commonly asked questions we get about network attached storage systems.

      How are NAS drives different than typical data storage hardware?

      NAS drives are specifically designed for constant 24×7 use with high reliability, built-in vibration mitigation, and optimized for use in RAID setups. Network attached storage systems also benefit from an abundance of health management systems designed to keep them running smoothly for longer than a standard hard drive would.

      Which features are the most important ones to have in a NAS device?

      The ideal NAS devices have multiple (2+) drive bays, should have hardware-level encryption acceleration, offer support for widely used platforms such as AWS glacier and S3, and have moderately powerful multicore CPU’s with at least 2GB of ram to pair with it. If you’re looking for these types of features, Seagate and Western Digital are some of the most reputable brands in the NAS industry.

      Are there any downsides to using NAS storage?

      NAS storage systems can be quite expensive when they’re not optimized to contain the right data, but this can be remedied with an analytics-driven NAS data management software, like Komprise Intelligent Data Management.

      Using NAS Data Management Tools to Substantially Reduce Storage Costs

      One of the biggest issues organizations are facing with NAS systems is trouble understanding which data they should be storing on their NAS drives and which should be offloaded to more affordable types of storage. To keep storage costs lower, an analytics-based NAS data management system can be implemented to give your organization more insight into your NAS data and where it should be optimally stored.

      For the thousands of data-centric companies we’ve worked with, most of them needed less than 20% of their total data stored on high-performance NAS drives. With a more thorough understanding of their NAS data, organizations are able to realize that their NAS storage needs may be much lower than they originally thought, leading to substantial storage savings, often greater than 50%, in the long run.

      Komprise makes it possible for customers to know their NAS and S3 data usage and growth before buying more storage. Explore your storage scenarios to get a forecast of how much could be saved with the right data management tools. This is what Komprise Dynamic Data Analytics provides
      Network-attached storage (NAS) is a type of file computer storage device that provides a local-area network with file-based shared storage. This typically comes in the form of a manufactured computer appliance specialized for this purpose, containing one or more storage devices.

      Network attached storage devices are used to remove the responsibility of file serving from other servers on a network, and allows for a convenient way to share files among multiple computers. Benefits of dedicated network attached storage include faster data access, easier administration, and simple configuration.
      In an enterprise, a network attached storage array can be used as primary storage for storing unstructured data, and as backup for archiving or disaster recovery. It can also function as an email, media database or print server for a small business. Higher end network attached storage devices can hold enough disks to support RAID, a storage technology that allows multiple hard disks into one unit to provide better performance times, redundancy, and high availability.

      Data on NAS systems is often mirrored (replicated) to another NAS system, and backups or snapshots of the footprint are kept on the NAS for weeks or months. This leads to at least three or more copies of the data being kept on expensive NAS storage.

      Getting Started with Komprise:

    • New Technology File System (NTFS) Extended Attributes

      Properties organized in (name, value) pairs, optionally set to New Technology File System (NTFS) files or directories to record information that can’t be stored in the file itself.

      Getting Started with Komprise:

  • O
    • Object Storage

      Object storage, also known as object-based storage, is a way of addressing and manipulating data storage as objects. Objects are kept inside a single repository, and are not nested inside a folder inside other folders.

      Though object storage is a relatively new concept, its benefits are clear. Compared to traditional file systems, there are many reasons to consider an object-based system to store your data.

      Object storage is becoming popular because it acts like a private cloud and provide linear scaling without limits. This is largely because it does not have any hierarchies and can scale out by simply adding more capacity. As a result, object storage is also very cost-efficient and is a good option for cheap, deep, scale-on-demand storage. Object storage is also resilient because it often keeps three or more copies of the data, much like public cloud storage.

      Getting Started with Komprise:

    • Observer (Komprise Observer)

      A Komprise virtual appliance running at the customer site that analyzes data across on-premises NAS storage, moves and replicates data by policy, and provides transparent file access to data that’s stored in the cloud.

      Getting Started with Komprise:

  • P
    • Policy-Based Data Management

      Policy-based data management is data management based on metrics such as data growth rates, data locations and file types, which data users regularly access and which they do not, which data has protection or not, and more.

      The trend to place strict policies on the preservation and dissemination of data has been escalating in recent years. This allows rules to be defined for each property required for preservation and dissemination that ensure compliance over time. For instance, to ensure accurate, reliable, and authentic data, a policy-based data management system should generate a list of rules to be enforced, define the storage locations, storage procedures that generate archival information packages, and manage replication.

      Policy-based data management is becoming critical as the amount of data continues to grow while IT budgets remain flat. By automating movement of data to cheaper storage such as the cloud or private object storage, IT organizations can rein in data sprawl and cut costs.

      Other things to consider are how to secure data from loss and degradation by assigning an owner to each file, defining access controls, verifying the number of replicas to ensure integrity of the data, as well as tracking the chain of custody. In addition, rules help to ensure compliance with legal obligations, ethical responsibilities, generating reports, tracking staff expertise, and tracking management approval and enforcement of the rules.

      As data footprint grows, managing billions and billions of files manually becomes untenable. Using analytics to define governing policies for when data should move, to where and having data management solutions that automate based on these policies becomes critical. Policy-based data management systems rely on consensus. Validation of these policies is typically done through automatic execution – these should be periodically evaluated to ensure continued integrity of your data.

      Getting Started with Komprise:

    • Posix ACLS

      Fine-grained access rights for files and directories. An Access Control Lists (ACL) consists of entries specifying access permissions on an associated object.

      Getting Started with Komprise:

    • Primary Storage

      Also known as Network Attached Storage (NAS), it’s the main area where data is stored for quick access. It’s faster and more expensive as compared to secondary storage, so it shouldn’t hold cold data.

      Getting Started with Komprise:

  • R
    • Rehydration

      The process to fully reconstitute files so the transferred data can be accessed and used. Block-level tiering requires rehydrating archived data before it can be used migrated, or backed up. No rehydration is needed with Komprise, which uses file-based tiering.

      Getting Started with Komprise:

    • REST (Representational State Transfer)

      REST (Representational State Transfer) is a software architectural style for distributed hypermedia systems, used in the development of Web services. Distributed file systems send and receive data via REST. Web services using REST are called RESTful APIs or REST APIs.

      There are several benefits to using REST APIs: it is a uniform interface so you don’t have to know the inner workings of an application to use the interface, it’s operations are well defined and so data in different storage formats can be acted upon by the same REST APIs, and it is stateless, so each interaction does not interfere with the next. Because of these benefits, REST APIs are fast, easy to implement with, and easy to use. As a result, REST has gained wide adoption.

      6 guiding principles for REST:

      1. Client–server – Separate user interface from data storage improves portability and scalability.
      2. Stateless – Each information request is wholly self contained so session state is kept entirely on the client.
      3. Cacheable – A client cache is given the right to reuse that response data for later, equivalent requests.
      4. Uniform interface – The overall REST system architecture is simplified and uniform due to the following constraints: identification of resources; manipulation of resources through representations; self-descriptive messages; and, hypermedia as the engine of application state.
      5. Layered system – The layered system is composed of hierarchical layers and
        each component cannot “see” beyond the immediate layer with which they are interacting.
      6. Code on demand (optional) – REST allows client functionality to be extended by downloading and executing code in the form of applets or scripts.

      The REST architecture and lighter weight communications between producer and consumer make REST popular for use in cloud-based APIs such as those authored by Amazon, Microsoft, and Google. REST is often used in social media sites, mobile applications and automated business processes.

      REST provides advantages over leveraging SOAP

      REST is often preferred over SOAP (Simple Object Access Protocol) because REST uses less bandwidth, making it preferable for use over the Internet. SOAP also requires writing or using a server program and a client program.

      RESTful Web services are easily leveraged using most tools, including those that are free or inexpensive. REST is also much easier to scale than SOAP services. Thus, REST is often chosen as the architecture for services available via the Internet, such as Facebook and most public cloud providers. Also, development time is usually reduced using REST over SOAP. The downside to REST is it has no direct support for generating a client from server-side-generated metadata whereas SOAP supports this with Web Service Description Language (WSDL).

      Data management software using REST APIs

      Open-APIs and a REST-based architecture are the keys to Komprise integrations. Using REST APIs gives customers the greatest amount of flexibility and here are some things customers can do with the Komprise Intelligent Data Management software via its REST API:

      • Get analysis results and reports on all their data
      • Run data migrations, data archiving and data replication operations
      • Search for data across all their storage by any metadata and tags
      • Build virtual data lakes to export to AI and Big Data applications

      A REST API is a very powerful, lightweight and fast way to interact with data management software.

      Getting Started with Komprise:

  • S
    • S3

      The S3 protocol is used in a URL that specifies the location of an Amazon S3 (Simple Storage Service) bucket and a prefix to use for reading or writing files in the bucket. See S3 Intelligent Tiering.

      Getting Started with Komprise:

    • S3 Intelligent Tiering

      S3 Intelligent Tiering is an Amazon storage class. Amazon S3 offers a range of storage classes for different uses. S3 Intelligent Tiering is a storage class aimed at data with unknown or unpredictable data access patterns. It was introduced in 2018 by AWS as a solution for customers who want to optimize storage costs automatically when their data access patterns change.

      Instead of utilizing the other Amazon S3 storage classes and moving data across them based on the needs of the data, Amazon S3 Intelligent Tiering is a distinct storage class that has embedded tiers within it and data can automatically move across the four access tiers when access patterns change.

      To fully understand what S3 Intelligent Tiering offers it is important to have an overview of all the classes available through S3:

      Classes of AWS S3 Storage

      1. Standard (S3) – Used for frequently accessed data (hot data)
      2. Standard-Infrequent Access (S3-IA) – Used for infrequently accessed, long-lived data that needs to be retained but is not being actively used
      3. One Zone Infrequent Access – Used for infrequently accessed data that’s long-lived but not critical enough to be covered by storage redundancies across multiple locations
      4. Intelligent Tiering – Used for data with changing access patterns or uncertain need of access
      5. Glacier – Used to archive infrequently accessed, long-lived data (cold data) Glacier has a latency of a few hours to retrieve
      6. Glacier Deep Archive – Used for data that is hardly ever or never accessed and for digital preservation purposes for regulatory compliance

      Image showing Amazon Web Services logo

      What is S3 Intelligent Tiering?

      S3 Intelligent Tiering is a storage class that has multiple tiers embedded within it, each with its own access latencies and costs – it is an automated service that monitors your data access behavior and then moves your data on a per-object basis to the appropriate level of tier within the S3 Intelligent Tiering storage class. If your object has not been accessed for 30 consecutive days it will automatically move to the infrequent access tier within S3 Intelligent Tiering, and if the object is not accessed for 90 consecutive days it will automatically move the object to the Archive Access tier and then after 190 consecutive days to the Deep Archive access tier. If an object is moved to the archive tier, the retrieval can take 3 to 5 hours and if it is in the deep archive tier it can take 12 hours. and if it is then subsequently accessed it will move it into the frequently accessed storage class.

      What are the costs of AWS S3 Intelligent Tiering?

      You pay for monthly storage, request and data transfer. When using Intelligent-Tiering you also pay for a monthly per-object fee for monitoring and automation. While there is no retrieval fee in S3 Intelligent-Tiering and no fee for moving data between tiers, you do not manipulate each tier directly. S3 Intelligent Tier is a bucket, and it has tiers within it that objects move through. Objects in the Frequent Access tier are billed at the same rate as S3 Standard, objects stored in the Infrequent Access tier are billed at the same rate as S3 Standard Infrequent Access, objects stored in the Archive Access tier are billed at the same rate as S3 Glacier and objects stored in the Deep Archive access tier are billed at the same rate as S3 Deep Glacier.

      What are the advantages of S3 Intelligent tiering?

      The advantages of S3 Intelligent tiering are that savings can be made. There is no operational overhead, and there are no retrieval costs. Objects can be assigned a tier upon upload and then move between tiers based on access patterns. There is no impact on performance and it is designed for 99.999999999% durability and 99.9% availability over annual average.

      What are the disadvantages of S3 Intelligent tiering?

      The main disadvantage of S3 Intelligent Tiering is that it acts as a black-box – you move objects into it and cannot transparently access different tiers or set different versioning policies for the different tiers. You have to manipulate the whole of S3 Intelligent Tier as a single bucket. For example, if you want to transition an object that has versioning enabled, then you have to transition all the versions. Also, when objects move to the archive tiers, the latency of access is much higher than the access tiers. Not all applications may be able to deal with the high latency.

      S3 Intelligent tiering is not suitable for companies with predictable data access behavior or companies that want to control data access, versioning, etc with transparency . Other disadvantages are that it is limited to objects, and cannot tier from files to objects, the minimum object storage requirement is 30 days, objects smaller than 128kb are never moved from the frequent access tier and lastly, because it is an automated system, you cannot configure different policies for different groups.

      S3 Data Management with Komprise

      Komprise is an AWS Advance Tier partner and can offer intelligent data management with visibility, transparency and cost savings on AWS file and object data. How is this done? Komprise enables analytics-driven intelligent tiering across EFS, FSX, S3 and Glacier storage classes in AWS so you can maximize price performance across all your data on Amazon. The Komprise mission is to radically simplify data management through intelligent automation.

      Getting Started with Komprise:

    • Scale-Out Grid

      Traditional approaches to managing data have relied on a centralized architecture – using either a central database to store information, or requiring a primary-replica architecture with a central primary server to manage the system. These approaches do not scale to address the modern scale of data because they have a central bottleneck that limits scaling. A scale-out architecture delivers unprecedented scale because it has no central bottlenecks. Instead, multiple servers work together as a grid without any central database or master and more servers can be added or removed on-demand.

      Scale-out grid architectures are harder to build because they need to be designed from the ground up to not only distribute the workload across a set of processes but also need to provide fault-tolerance so if any of the processes fails the overall system is not impaired. Below is a screenshot of the Komprise elastic grid architecture. Read the Komprise Architecture Overview white paper to learn more.

      Komprise Scale-Out Architecture

      Getting Started with Komprise:

    • Scale-Out Storage

      Scale-out storage is a type of storage architecture in which devices in connected arrays add to the storage architecture to expand disk storage space. This allows for the storage capacity to increase only as the need arises. Scale-out storage architectures adds flexibility to the overall storage environment while simultaneously lowering the initial storage set up costs.

      With data growing at exponential rates, enterprises will need to purchase additional storage space to keep up. This data growth comes largely from unstructured data, like photos, videos, PowerPoints, and Excel files. Another factor adding to the expansion of data is that the rate of data deletion is slowing, resulting in longer data retention policies. For example, many organizations are now implementing “delete nothing” data policies for all kinds of data. With storage demands skyrocketing and budgets shrinking, scale-out storage can help manage these growing costs.

      Getting Started with Komprise:

    • Secondary Storage

      Secondary storage devices are storage devices that operate alongside the computer’s primary storage, RAM, and cache memory. Secondary storage is for any amount of data, from a few megabytes to petabytes. These devices store almost all types of programs and applications. This can consist of items like the operating system, device drivers, applications, and user data. For example, internal secondary storage devices include the hard disk drive, the tape disk drive, and compact disk drive.

      Some key facts about secondary storage:

      • It is typically designed for long-term storage – its non-volatile media such as solid state devices, optical or magnetic storage devices such as tape.
      • It is typically orders of magnitude cheaper than primary storage – it is designed for more capacity storage than performance.
      • It can either be hosted on-premises at data centers or in the cloud.
      • It can use file (Network Attached Storage NAS via NFS and SMB/CIFS protocols) or block-based storage-area-network (SAN) or object formats. Object based secondary storage is extremely popular today especially in the cloud.
      • Examples include: Amazon Simple Storage Service (S3), Amazon Glacier, Azure Blob, Google Cloud ColdLine Storage, and on-premises object stores such as IBM Cloud Object Storage.
      • Use cases include: Cold data storage, Cold data archiving, backup and Disaster Recovery storage.
      • Data management software is used to find the right data to place on secondary storage and move data to secondary storage without user disruption.

      Secondary storage typically archives inactive cold data and backs up primary storage through data replication or other data backup methods. This replication or data backup process, ensures there is a second copy of the data. In an enterprise environment, the storage of secondary data can be in the form of a network-attached storage (NAS) box, storage-area network (SAN), or tape. In addition, to lessen the demand on primary storage, object storage devices may also be used for secondary storage. The growth of organizational data has prompted storage managers to move data to lower tiers of storage to reduce the impact on primary storage systems. Furthermore, in moving data from more expensive primary storage to less expensive tiers of storage, storage managers are able to save money. This keeps the data easily accessible in order to satisfy both business and compliance requirements.

      When archiving cold data to secondary storage, it is important that the archiving/tiering solution does not disrupt users by requiring them to rewrite applications to find the data on the secondary storage. Transparent archiving is key to ensuring that data moved to secondary storage still appears to reside on the primary storage and continues to be accessed from the primary storage without any changes to users or applications. Transparent move technology solutions that use file-level tiering to accomplish this.

      Getting Started with Komprise:

    • Shadow IT

      Shadow IT is a term used in information technology describing systems and solutions not compliant with internal organizational approval. This can mean typical internal complacence is not followed, such as documentation, security, reliability, etc.

      However, shadow IT can be an important source of innovation, and can also be in compliance, even when not under the control of an IT organization.

      An example of shadow IT is when business subject matter experts can use shadow IT systems and the cloud to manipulate complex datasets without having to request work from the IT department. IT departments must recognize this in order to improve the technical control environment, or select enterprise-class data analysis and management tools that can be implemented across the organization, while not stifling business experts from innovation.

      Ways to IT teams can cope with shadow IT are:

      • Reducing IT evaluation times for new applications
      • Consider cloud applications
      • Provide ways to safely identify and move relevant data to the cloud
      • Clearly document and inform business controls
      • Approve Shadow IT in the short term
      • Get involved with teams across your organization to help stay informed of upcoming needs

      Getting Started with Komprise:

    • Shared-Nothing Architecture

      A distributed-computing architecture in which each update request is handled by a single node, which eliminates single points of failure, allowing continuous overall system operation despite individual node failure. Komprise Intelligent Data Management is based on a shared-nothing architecture.

      Getting Started with Komprise:

    • Showback (“Shameback”)

      A method of tracking data center utilization rates of an organization’s business units or end users. Similar to IT chargeback, the metrics for showback are for informational purposes only; no one is billed.

      Getting Started with Komprise:

    • SMB format (Server Message Block)

      A network communication protocol for providing shared access to files, printers, and serial ports between nodes on a network. (also known as Common Internet File Systems (CIFS)).

      Getting Started with Komprise:

    • Storage Pool

      What is a storage pool?

      Storage pools are collections of storage volumes exported to a shared storage environment. Traditionally, storage pools were limited to storage volumes from a single vendor – for instance, you may have Flash and Disk storage volumes in a storage pool.

      Storage pools may be homogeneous – that is, all the storage volumes are SSD/Flash, or all the storage volumes are disk, etc.  or they may be heterogeneous – the storage volumes are different classes of storage e.g. Flash, Disk, etc.  Storage tiering is an integral solution to handling heterogeneous storage pools.

      What is storage tiering within a storage pool and why is it needed?

      Storage tiering is a technique whereby the file metadata and the frequently-accessed blocks are stored in the highest tier and less-accessed blocks are downgraded to lower, cheaper tiers within a storage pool. This automated storage tiering approach allows the vendor to reduce costs by using smaller, faster tiers while still providing good performance. Storage tiering is often touted as a storage efficiency technique for customers to save on storage costs.  But a key thing to remember is that the bulk of the cost of data is not in the storage but in the active management and backups of the data.  Storage efficiency impacts the storage cost but not the active data management costs.

      What is cloud tiering and how does it relate to storage pools?

      Storage array vendors are now using their tiering technologies to tier data to the cloud. This is not what the technology was originally designed for, since the storage pool is no longer under a single vendor’s control and no longer local to a network.

      Storage array vendors like NetApp and Dell EMC have created “Pool” solutions to externally tier data to less expensive storage such as in the cloud. Storage pools can reduce the cost of fast, expensive flash-based storage by migrating non-critical data sets to lower-cost storage for archiving and compliance and for “cold” data which hasn’t been accessed for a designated period of time to the cloud.

      Read the blog post: What you need to know before jumping into the cloud tiering pool

      image of life preserver in pool to promote blog about what you need to know before jumping into the cloud tiering pool

      What are the challenges and considerations for cloud storage pools?

      While these solutions work well for tiering secondary data such as snapshot copies to the cloud, they result in unnecessary costs and lock-in when tiering and archiving files.  As well, the pool approach tiers data in proprietary blocks versus files that all applications can understand. This presents the following challenges:

      • Policies to specify the blocks to be tiered are limited, resulting in much higher access rate to the cloud, and higher egress costs.
      • Block tiering to the cloud can reduce the performance of the storage array. Given the vast quantities of data most enterprises are dealing with today, block tiering is not suited for general data tiering to a public cloud across high latency channels.
      • Block tiering locks you into your storage vendor. Since the cold data is tiered to the cloud in a proprietary format, when it is time to decommission your storage array and replace it with a new one you must stay with the same vendor.
      • Proprietary lock-in. You cannot directly use native cloud services to access your data in the cloud. It has to be through the proprietary storage filesystem itself. This creates unnecessary licensing costs that customers must pay forever to access their data., resulting in much higher access rate to the cloud, and higher egress costs.

      Getting Started with Komprise:

    • Stubs

      Placeholders of the original data after it has been migrated to the secondary storage. Stubs replace the archived files in the location selected by the user during the archive. Because stubs are proprietary and static, if the stub file is corrupted or deleted, the moved data gets orphaned. Komprise does not use stubs, which eliminates this risk of disruption to users, applications, or data protection workflows.

      Getting Started with Komprise:

    • Symbolic Link

      Diagram of network attached storage (NAS) system.

      What is a Symbolic Link?

      Symbolic Links, also known as symlinks, are file-system objects that point toward another file or folder. These links act as shortcuts with advanced properties that allow access to files from locations other than their original place in the folder hierarchy by providing operating systems with instructions on where the “target” file can be found.
      For the operating system, the symlink is transparent for many operations and functions in the same manner as the target file or folder would even though it’s only a link that points to the original. For example, if a program needs to be in folder A to run, but you want to store it in folder B instead, the entire A folder could be moved into the B folder with a symbolic link created in folder A which points to folder B. When the program is launched, the operating system would refer to folder A, find the symbolic link to folder B, and run the program from folder B as if it was still in its original place in folder A.
      This method is widely used in the storage industry in programs such as OneDrive, Google Drive, and Dropbox to sync files and folders across different platforms of storage or in the cloud.
      These types of links began to appear in operating systems in the late 70’s such as RDOS. In modern computing, symbolic links are present in most Unix-like operating systems which are supported by the POSIX standard such as Linux, macOS, and Tru64. This feature was also added to Microsoft Windows starting with Windows Vista.

      Symbolic Links vs Hard Links

      Both types of symbolic links allow seamless and mostly transparent targeting of a file, but they do so in different ways.

      Soft links, also referred to as symbolic links by Microsoft, work similarly to a normal shortcut in the sense that they point directly to file or folder itself. These types of links also use less memory overall.
      On the other hand, hard links point to the storage space designated to hold the contents of the file or folder.
      In this sense, if the location or the name of the file changes, then a soft link would no longer work since it was pointing to the original file itself, but with a hard link, any changes made to the original file or the hard link contents are mirrored by the other because both are pointing to the same location on the storage.
      Hard links act as a secondary entrance to the same file or folder which they are linked to, but they can only be used to connect two entities within the same file system, whereas soft links can bridge the gap between different storage devices and file systems.
      Hard symbolic links also have more restrictive requirements than soft links:

      • Hard links may not be able to link to directories.
      • The target file or folder for a hard link must exist.
      • Hard links cannot point to targets that are located on different partitions, volumes, or file systems.

      Junctions

      A Junction is a lesser-used, third type of symbolic link that combines aspects from both hard and soft links. The target file must exist for the junction to be created, but if the target file or folder is erased afterward, the link will still be there but will no longer be functional.

      How are Soft and Hard Symbolic Links Commonly Used?

      Hard links are used to create “backups” on filesystems without using any additional storage space. This is a benefit as it is often easier to manage a single directory with multiple references pointing to it rather than managing multiple instances of the same directory. If the file or folder is no longer accessible from its original location, then the hard link can be used as a backup to regain access to those files.
      The Time Machine feature on macOS uses hard symbolic links to create images to be used for backup.
      Soft links are used more heavily to enable access for files and folders on different devices or filesystems. These types of symbolic links are also used in situations where multiple names are being used to link to the same location.

      Types of Businesses that Make Use of Symbolic Links

      Symbolic links are leveraged in nearly every industry that uses computers, but some industries make use of these links more than others. Below are industries where symbolic links are most commonly used.

      Creating Symbolic Links

      The process used to create symbolic links is different on each type of operating system. Below are brief instructions on how a soft or hard link can be set up in Linux and Windows.

      How to Create a Soft Link in Linux

      To create a soft symbolic link in Linux, the ln command-line utility can be used as such:
      ln -s [OPTIONS] FILE LINK
      The FILE argument represents the origin of the link. The LINK argument represents the target destination for the soft link.
      When the command is successful, there is no output and the command-line will return zero.

      How to Create a Hard Link in Linux

      For creating hard links in Linux, a similar version of the ln command is used but without the -s:
      ln [OPTIONS] FILE LINK
      The FILE argument is still the origin location and the LINK argument is still the destination file or directory.

      Creating a Windows Soft Link

      The mklink command can be used to create soft links in Windows Vista & later through a command prompt or powershell with elevated permissions. By default, this command with no options will produce a soft link.
      mklink command:
      mklink Link Target

      The Link argument is the origin file/directory location and the Target argument represents the intended destination file.
      For creating a soft link pointing to a directory, this command is used instead:
      mklink /D Link Target

      Creating a Windows Hard Link

      Similarly to creating a soft link in Windows, the mklink can also be used to create hard links when /H is included as an option as such:
      mklink /H Link Target
      For creating a junction, the /J option is used instead of /H:
      mklink /J Link Target

      Getting Started with Komprise:

  • T
    • Tagging data

      The often-lengthy process of annotating or labeling data (like text or objects in videos and images) to make it detectable and recognizable to computer vision to train the AI models through ML algorithm for predictions. Creating Virtual Data Lakes with Komprise Deep Analytics makes this process much faster.

      Getting Started with Komprise:

    • Transparent Move Technology
      Transparent Move Technology refers to an approach for Data Archiving that archives cold files transparently such that:
      1. The archived files can still be viewed and opened from the original location so users and applications do not need to change their data access.
      2. The archived files can be accessed via the original file protocols even if they are archived on an object repository.
      3. There is no change to the data path for the hot data that is not archived.  So there are no server or client side agents or static stubs.
      4. Accessing archived files does not cause the data to be brought back or rehydrated.  The approach is transparent to backup software and other applications.

      Getting Started with Komprise:

  • U
    • Unstructured Data

      What is Unstructured Data?

      Data can be of two broad types: structured data and unstructured data.

      • Structured Data: Structured data is data that can be organized by structured categories, such as rows and columns in an Excel spreadsheet or a database. For example, accounting records are structured data because you can organize them by customer, by geography, by product, etc. Structured data is typically stored in a database and can be queried using query languages such as Structured Query Language (SQL). Most data was predominantly structured until 2000 but since then we have seen an explosion of unstructured data. Today, structured data accounts for less than twenty percent of the world’s data.
      • Unstructured Data: Unstructured data is data that doesn’t fit neatly in a traditional database and has no identifiable internal structure. This is the opposite of structured data, which is data stored in a database. Up to 80% of business data is considered unstructured, with this number increasing year over year. Examples of unstructured data are text documents, e-mail messages, photos, videos, presentations, social media posts, and more.

      Graph showing unstructured data growing faster than structured data over time

       

      Unstructured data usually does not include a predefined data model, and it does not match well with relational tables. Text heavy, unstructured data may include numbers and dates, as well as facts. This leads to difficulty in identifying this data using conventional software programs.

      Unstructured data is the predominant data type that is generated by most applications today – from self-driving cars, to Internet of Things (IOT) devices, to genome sequencers, to video and audio files, most of the data we generate and use today is unstructured.

      Why is Unstructured Data Growing so Fast?

      The analyst firm IDC predicts that we will generate over 175 zettabytes of data by 2025 (one zettabyte is 4.4 Billion 1 terabyte drives!). They also predict that in the next three years we will generate more data than what we created over the past 30 years, and this growth trend will continue.

      Most of the data we generate today is unstructured because unstructured data has several advantages over structured data:

      • Wider Use Cases for Unstructured Data: Structured data has a rigid pre-defined structure and it can only be used for its intended purpose. This narrows the number of use cases for structured data – while it is useful for transactional applications like revenue tracking or catalogs, it is not a good use for applications that generate data that is not so easy to categorize such as video or genomics.
      • Various Formats: Unstructured data can be stored in a variety of formats – from a mp4 video to a genomics BAM file to a .log diagnostics file to an X-RAY image that may be stored as a digital PACS format, all of these are types of unstructured data. So, an accurate way to describe unstructured data is that it has a variety of formats and not just one format. This means more applications can generate unstructured data and tailor the format to their use.
      • Various Sizes: Unlike a cell in a database, unstructured data does not have to be a specific size or character limit. For example, you can have small video files for short snippets and large video files for full length movies. This also increases flexibility in how unstructured data is generated and used.

      Since unstructured data is easier to create and use, more applications and users are working with unstructured data.

      Unstructured Data Management

      Managing growing volumes of unstructured data generated within an organization are leading to higher expenses.

      What to know about unstructured data:

      • Volume: The sheer quantity of data will continue to grow in a incomprehensible rate
      • Velocity: The quantity of data is coming in at a continually faster rate
      • Variety: The types of data continue to be more varied

      These 3 Vs of unstructured data, originally defined by former Meta Group / Gartner industry analyst Doug Laney, means that managing unstructured data growth is critical for organizations as they find their budgets and resources are getting stretched to their limits.

      Unstructured data management requires an understanding of what data is hot and actively used, and what data is cold and rarely accessed. In most enterprises, over 80% of unstructured data becomes cold within a year of creation – yet it continues to be managed on the most expensive storage and it continues to consume expensive backup resources. Analytics-driven data management of unstructured data can change this by identifying hot and cold data across storage and managing hot data on expensive environments while offloading cold data to lower cost passive management. Unstructured data management should be done without restricting access to the cold data – so users and applications continue to see and access the cold data exactly as before, while the organization saves on cold data storage and backups. To understand how Komprise enables enterprise IT organizations to analyze, move, and manage unstructured data and save costs on storage, backup and cloud infrastructure read the white paper: Komprise Intelligent Data Management Architecture Overview.

       

      Getting Started with Komprise:

  • V
    • Virtual Data Lakes

      Provides the main storage area and execution ability to enable Big Data, AI, and ML projects.

      Research has shown that with Big Data projects, up to 80% or more time is spent on finding the right data and getting it out of data centers and cloud infrastructure. Powerful metadata-based search and indexing technology automates the process of finding unstructured data based on your specific criteria. This capability allows organizations to dynamically build virtual data lakes across storage silos on the fly so they can better manage and reuse your data for AI and ML.

      Komprise Deep Analytics lets you build specific queries to find the files you need, tag it to build real-time virtual data lakes that the entire organization can use, without having to first move the data.

      Getting Started with Komprise:

Back

Cloud Data Storage

Cloud data storage is a service for individuals or organizations to store data through a cloud computing provider such as AWS, Azure, Google Cloud, IBM or Wasabi. Storing data in a cloud service eliminates the need to purchase and maintain data storage infrastructure, since infrastructure resides within the data centers of the cloud IaaS provider and is owned/managed by the provider. Many organizations are increasing data storage investments in the cloud for a variety of purposes including: backup, data replication and data protection, data tiering and archiving, data lakes for artificial intelligence (AI) and business intelligence (BI) projects, and to reduce their physical data center footprint. As with on-premises storage, you have different levels of data storage available in the cloud. You can segment data based on access tiers: for instance, hot and cold data storage.

cloud tiering

Types of Cloud Data Storage

Cloud data storage can either be designed for personal data and collaboration or for enterprise data storage in the cloud. Examples of personal data cloud storage are Google Drive, Box and DropBox.

Increasingly, corporate data storage in the cloud is gaining prominence – particularly around taking enterprise file data that was traditionally stored on Network Attached Storage (NAS) and moving that to the cloud.

Cloud file storage and object storage are gaining adoption as they can store petabytes of unstructured data for enterprises cost-effectively.

Enterprise Cloud Data Storage for Unstructured Data (Cloud File Data Storage and Cloud Object Data Storage)

Enterprise unstructured data growth is exploding – whether its genomics data, video and media content, or log files or IoT data.  Unstructured data can be stored as files on file data storage or as objects on cost-efficient object storage.  Cloud storage providers are now offering a variety of file and object storage classes at different price points to accommodate unstructured data.  Amazon EFS, FSX, Azure Files are examples of cloud data storage for enterprise file data, and Amazon S3, Azure Blob and Amazon Glacier are examples of object storage.

Advantages of Cloud Data Storage

There are many benefits of investing in cloud data storage, particularly for unstructured data in the enterprise. Organizations gain access to unlimited resources, so they can scale data volumes as needed and decommission instances at the end of a project or when data is deleted or moved to another storage resource. Enterprise IT teams can also reduce dependence on hardware and have a more predictable storage budget. However, without proper cloud data management, cloud egress costs and other cloud costs are often cited as challenges.

In summary, cloud data storage allows:

  • The opportunity to reduce capital expenses (CAPEX) of data center hardware along with savings in energy, facility space and staff hours spend maintaining and installing hardware.
  • Deliver vastly improved agility and scalability to support rapidly changing business needs and initiatives.
  • Develop an enterprise-wide data lake strategy that would otherwise be unaffordable.
  • Lower risks from storing important data on aging physical hardware.
  • Leverage cheaper cloud storage for archiving and tiering purposes, which can also reduce backup costs.

Cloud Data Storage Challenges and Considerations

  • Cloud data storage can be costly if you need to frequently access the data for use outside of the cloud, due to egress fees charged by cloud providers.
  • Using cloud tiering methodologies from on-premises storage vendors may result in unexpected costs, due to the need for restoring data back to the storage appliance prior to use. Read the white paper Cloud Tiering: Storage-Based vs. Gateways vs. File-Based
  • Moving data between clouds is often difficult, because of data translation and data mobility issues with file objects. Each cloud provider uses different standards and formats for data storage.
  • Security can be a concern, especially in some highly regulated sectors such as healthcare, financial services and e-commerce. IT organizations will need to fully understand the risks and methods of storing and protecting data in the cloud.
  • The cloud creates another data silo for enterprise IT. When adding cloud storage to an organization’s storage ecosystem, IT will need to determine how to attain a central, holistic view of all storage and data assets.

For these reasons, cloud optimization and cloud data management are essential components of an enterprise cloud data storage strategy. Komprise has strategic alliance partnerships with hybrid and cloud data storage technology leaders:

Want To Learn More?

Getting Started with Komprise:

Contact | Demo