GUIDE TO UNSTRUCTURED DATA TIERING
“We love how Komprise incorporates dynamic links that work on any storage, which means we still have access to the moved files on our network. The fact that we could drastically reduce the footprint on our file servers was the other main selling point of Komprise.” – Kevin Rhode, CIO, District Medical Group
Overview: The need for unstructured data tiering
As unstructured data growth has accelerated in the past few years, so has the cost of storage, backups and disaster recovery for enterprises. In fact, these costs constitute on average 30% or more of the total IT budget, according to the Komprise State of Unstructured Data Management survey.
Traditionally, IT organizations have dealt with data growth by purchasing more on-premises storage – yet this is no longer financially viable, nor necessary. On average, 60-70% of data is rarely accessed or “cold”, yet it sits on expensive storage and consumes the same backup resources as active or “hot” data.
Data tiering for unstructured data can entail moving cold data to a cheaper, secondary storage tier in the data center or in the cloud. It creates a live, online archive for this stale data which can save dramatically on annual storage costs.
With cloud tiering, organizations get the added benefit of being able to use cloud-based AI and analytics services on their data. Data tiering also allows users to easily access data again in the future, which is not the case with an offline archive such as tape. Most often, tiered data is not accessed or recalled again yet many departmental leaders require that access will be available if needed.
Storage vendors approach data tiering differently than independent data management vendors such as Komprise. We will explore the differences and considerations in this guide below with links to detailed content throughout.
Read: Why Storage-Agnostic Unstructured Data Management
Understanding the different types of data tiering
Block Level Tiering
Block level tiering is a traditional, proprietary tiering method used by storage vendors. Storage vendors use an efficient block-based storage system to store files, where each file is represented by a set of blocks. Metadata is often separated in these architectures from data blocks. Therefore, storage vendors employ block-level tiering, which moves blocks which are pieces of files from hot to cold (lower-cost) tiers to free up space on the high-performing, expensive storage appliance.
Storage vendors promote their own storage block-level tiering to move data out of the file server and into a cloud object storage tier for archiving purposes. If a user needs to access the tiered file, the request must go through the original file server. They cannot access moved blocks from their new location, such as the cloud, because they are meaningless without all the other data blocks and the file metadata which is stored on the on-premises NAS.
This automated storage-based tiering when used within a vendor’s storage array allows the vendor to reduce costs by shifting less-active data blocks to lower cost tiers. This solution is sensible for managing data placement within a storage array and tiering snapshots to the cloud. But using storage tiering to move cold files to a destination outside the storage array such as the cloud leads to excessive recalls, egress fees, defragmentation and unexpected rehydration. This in turn results in high costs and lock-in.
Using storage tiering to move files to the cloud can result in unexpected rehydration and egress fees due to antivirus scans causing recalls of cold blocks or storage defragmentation causing recalls of blocks. These recalls may trigger egress costs and/or rehydration which consumes additional capacity on the on-premises storage or NAS. This means that even when you want to migrate away from the storage vendor, ironically, you will need to buy more of their storage – which is undesirable lock-in.
File-Level Tiering
File-level tiering is an alternative to proprietary, block-level tiering. Unlike traditional storage vendor tiering solutions which use block tiering, file-level tiering moves the entire file including all the attributes, permissions and metadata.
This approach ensures that you retain full file fidelity even when you are moving a file to a different storage architecture such as object storage or cloud. Therefore, applications and users can access the moved file from the original location as before. They can also directly open the file natively in the secondary location without requiring any third-party software or storage operating system.
File-level tiering delivers a non-disruptive user experience with maximum cost savings: on average 70-80% of the annual storage and backup budget. Your organization has the flexibility to move files again when needed and avoid unnecessary rehydration costs. We will explore other details and benefits of file-level tiering later in the guide.
Read: The Benefits of File-Level vs. Block-Level Tiering
Top Considerations for Data Tiering
1) How much data do you have and what are the growth predictions?
How much total data do you have and what are your growth forecasts over the next year? What are your total storage, backup and DR costs overall and by vendor? Most companies are facing double-digit data growth and need to find a way to cut ongoing costs. Do you take snapshots of your NAS? Taking 30-90 days of snapshots will result in at least a 30% increase in your overall data footprint. Is your NAS replicated? Most customers forget about this cost. For 1PB of data you could be managing 2.6PB with snapshots and replication. Gathering all these details is a crucial first step before you move any data.
2) Tier or migrate?
We generally hear about organizations “moving to the cloud” but there are many ways to do that, in practice. Cloud data migration is ideal if your goal is to reduce on-premises storage capacity, adopt new storage technologies and increase investments in the more flexible, on-demand nature of cloud storage.
When it comes to data, you’ll want to migrate data to the cloud to:
- Leverage cloud file systems and run applications in the cloud. This delivers similar data performance and availability as on-prem, but with typically more scalability than on-prem systems.
- To use the cloud as an offline archive, using low-cost object storage such as Azure Blob or Amazon’s S3 Glacier and Glacier Instant Retrieval.
However, if you want to create an online archive in the cloud for easy access and retrieval of your data later, you want to leverage cloud data tiering. This way files still appear to be on-prem and can be accessed by simply double-clicking on them. Data tiering is better in cases where you want to lower storage costs and capacity for data that you access infrequently — but which you may still need to recall on-premises in the future.
Inside the data center, migrations occur largely when moving to new storage technologies, while tiering occurs for similar cost-savings and accessibility reasons as cloud data tiering. As described earlier, be sure to understand the differences between block-based and file level tiering regardless.
Read the article: Cloud Data Migration or Cloud Data Tiering?
3) How much flexibility do you need for data tiering?
Some tiering solutions only allow you to tier based on the age of data or impose limits on how much data you can tier altogether. If data tiering is a core strategy for cost optimization, you need flexibility to create custom queries. That way you can search for data to tier based on file types and sizes, last access or last modification date, directories and shares or project/owner. Make sure that your data tiering solution can tier from anywhere to anywhere. After all, in the hybrid cloud age, there are always new choices for data storage and you do not want to be locked into a particular pathway.
4) What are your user and departmental requirements?
It’s vital to have healthy communications with departmental stakeholders to determine their needs and how to optimize storage and data management. Using analytics to show time of last access can level the playing field and reduce conflict. In most enterprises, at least 60% data has not been accessed in over a year; this makes a great case for tiering that rarely-accessed data to archival storage. Using an unstructured data management system with analytics built in can level up your decision-making capabilities.
How aggressive is your cloud strategy?
If your organization wants to move faster to the cloud, the roadmap is not always clear. Since 80% of unstructured data is typically cold, by offloading cold data to the cloud, you get a large chunk of your data in the cloud without disrupting existing users and applications. This creates an easy, frictionless path to the cloud. Later you can extract value from the cold data using artificial intelligence and machine learning tools available in the cloud. One other benefit of the right cloud tiering strategy is that processing the cold data does not put stress on your high-performance, on-premises storage.
Know Your Cloud Tiering Options
Will you need native access to data at the destination?
If you anticipate needing native access to tiered data at the target, block-based tiering will not work for you. This is because data is in proprietary form and the entire file won’t be on the target. Users can only read the tiered data from the source. With the plethora of AI and ML tools in the cloud, native data access is a key factor to consider.
Why Cloud Native Data Access Matters
How often will users need to recall data that has been tiered?
Some solutions work by immediately rehydrating data upon first access. This automatically increases your costs as you will need to retain extra storage for rehydration. Ideally, you will want a tiering solution that allows you to configure when data can be rehydrated. This helps control costs by eliminating excessive or unnecessary recall of data.
Watch the video
Komprise Data Tiering Use Cases
Hybrid Cloud Tiering
A simple way to adopt the cloud without disrupting users and applications is to tier cold data from on-premises NAS to low-cost object storage such as Azure Blob or Amazon S3. Before embarking on a cloud migration project, take the time to first analyze data and tier rarely-used data to the archival storage target before migrating warm or hot data to cloud file storage. This will reduce the time, effort and cost of your data migration project and deliver a far better ROI overall from your new storage and/or cloud investments.
Private Cloud Tiering
Enterprise IT organizations with infrastructure primarily hosted within their data centers usually have a variety of technologies including NAS storage, an object storage solution, and a cheaper third-level solution such as tape storage. Proprietary storage-based tiering solutions do not work with a multi-vendor mix of solutions. An open, standards-based file-level tiering solution can double your savings and optimize all storage, even in a fully on-premises scenario.
All Cloud Tiering
Enterprise IT organizations have many options in the cloud today for file storage, such as Amazon EFS and FSX and Azure Files. These cloud NAS solutions are a great way to move file workloads to the cloud as they require no rewriting of the applications. However, cloud NAS solutions may not be cheaper over time than an on-premises NAS, so moving data between cloud tiers is still important! When tiering data from a cloud NAS to a cloud object class, a file-level tiering solution moves the entire file from the cloud file storage to the S3 or Blob classes. This approach also eliminates the full cost of the cold files from the cloud file system and its replication region.
Three Komprise Customer Cloud Tiering Use Cases
Komprise Solutions for File-Based Tiering
Komprise Intelligent Data Management brings granular analysis, high-performance migrations, transparent tiering, and full data lifecycle management of file and object data, built on our Global File Index. Our patented tiering technology, Transparent Move Technology (TMT) delivers all the benefits of no lock-in, file-level tiering discussed above, with the added benefits of a comprehensive, independent data management platform.
How it works:
Komprise replaces the file on the source storage array with a Komprise Dynamic Link (KDL), a symbolic link that redirects the user request to the new destination. When a user clicks on the link, Komprise serves the file and metadata of the original file exactly as it originally existed on the primary storage.
When a file is tiered to the cloud or other secondary storage, it is written in the format recognized by the destination. For example, if the file is tiered to Amazon S3, it is stored as an object and can be read by a standard S3 browser without any third-party software and that includes Komprise. If the file is tiered to a cloud file storage, Komprise will retain the original file protocols such as NFS and SMB and full file fidelity. Komprise works across multi-vendor NAS since it follows standard NFS and SMB constructs. Of course, in all of these cases, full file fidelity is preserved by Komprise so that you can retrieve the file through the original protocols from the original location even after it is tiered to the cloud.
Another benefit of moving a file with TMT as opposed to proprietary blocks with storage tiering is that TMT reduces backup and DR costs by shrinking the data footprint on primary storage whereas storage tiering only provides storage efficiency.
Benefits of Komprise Transparent Data Tiering
Komprise TMT delivers an optimal user and IT experience. Komprise intelligent data tiering maximizes cost savings for data management and creates a flexible, no lock-in strategy for your overall storage environment.
- Transparent access to moved files from original NAS. Users and apps can open and access the moved files from their original location exactly as before, without any changes.
- File-object duality and cloud native access. With TMT, employees can access tiered data natively in the cloud. Komprise moves the entire file including all metadata, file attributes and permissions. This means you can directly access your data and extract more value from it, such as by using it in cloud AI tools.
- No obstruction of hot data. Storage solutions move data in a proprietary form and sit in the hot data path, which can affect performance. Komprise moves data using standard protocol constructs, so it is not in the hot data path. Komprise is only called when users access cold files, which happens rarely.
- No proprietary interfaces. Proprietary interfaces such as stubs or agents are brittle and difficult to manage. If a user or application accidentally deletes the stubs, they can leave data orphaned, creating havoc for users and applications. Komprise Dynamic Links contain all the file system information that a user or application requires in a simple, scalable format: no stubs, no agents.
- Minimize rehydration. Komprise TMT moves data at the file level with all the metadata fully preserved at the target. Since Komprise enables the file to be used directly from the cloud without requiring rehydration, and since it moves data outside the storage platform, Komprise does not incur the unnecessary recall, egress and rehydration costs inherent to storage tiering solutions.
- Systematic, continuous tiering for maximum savings. Since TMT does not disrupt users and applications, it allows IT to deploy tiering across their storage devices using automated policies. IT can “set it and forget it”. This not only saves IT from significant manual overhead but ensures that the cold data is continuously tiered to maximize savings. Most customers save 70-80% on their annual storage and backup costs using Komprise.
- Ransomware defense. Komprise helps organizations protect file data from ransomware at 80% lower cost through a combination of transparent cloud tiering and replication of data to an immutable, object-locked location.
- Flexible configurations. Unlike storage tiering solutions, Komprise allows you to configure cold data age and tiering, such as at the departmental level and for different types of data.
- Ongoing analytics. Komprise with its Global File Index delivers rich analytics across storage, which isn’t available in storage tiering products. Showback reports, for instance, can show old data past retention period and its costs by geography. You can use Komprise to do cost modeling for new storage and report on hot topics such as orphaned data and potential duplicates. Komprise also provides a showback report to improve departmental transparency and communication. Learn more about Komprise Reports.
Komprise Customer Stories for Data Tiering
Komprise customers across multiple industries are saving 60% or more in their annual storage budgets, with strategic data tiering. Here are two stories that exemplify the benefits they are achieving.
Pfizer
Pfizer has offices on 6 out of 7 continents, works with several of the leading storage vendors and has an installed base of many different generations of data management products. The company’s active acquisition strategy means that it is regularly acquiring additional data storage technologies, increasing overall complexity. Pfizer needs to keep historical data for future R&D: recall that SARS data was useful to researchers in 2020 when developing vaccines and treatments for Covid-19. Yet keeping petabytes of data on top-grade, on-premises storage isn’t a sound financial decision if the data is not accessed regularly. Working with Komprise and AWS, Pfizer has achieved the following results:
- Saving 75% on storage by using Komprise to analyze and continuously move petabytes of cold data to Amazon S3 as it ages.
- Storage managers and researchers both are finding additional benefits from this new analytics-based data management strategy, including zero user disruption and a foundation for data lakes.
Learn More Here
District Medical Group of Arizona
District Medical Group of Arizona (DMG) is using Komprise to identify and then tier all data to Wasabi that is two years or older, freeing up half the space on its Windows file servers. In Wasabi, the storage team segments archived data into buckets for each year, up to 10 years. Once data has exceeded DMG’s retention policy of 10 years, it is deleted.
Results
- An analytics approach to unstructured data management, allowing DMG to be more cost-effective and strategic with its data assets spanning all parts of the business including financial and clinical data.
- An estimated savings of $100,000 over three years from moving cold data off Windows servers to less expensive Wasabi cloud storage.
- 5.5TB reduction of backup data and 75% faster backup processes.
- Reclaimed 50%+ of on-premises storage capacity.
- Zero user disruption from moving files to Wasabi.
Read More Here
Komprise Partners for Data Tiering
At Komprise, we work closely with our partners to ensure our customers optimize their data storage and maximize unstructured data value.
Migrate from any NAS to IBM Cloud Object Storage and IBM Spectrum Scale solutions. Simply pick your IBM target in Komprise, and the solution can automatically move data by policy. Moved data is accessed exactly as before from your source.
Komprise Intelligent Data Management
The Komprise analytics-first approach to unstructured data management and mobility, including built-in tools for cost modeling and planning, helps customers continuously right place their data for savings and long-term value according to business needs. Intelligent data tiering means that organizations can create policies as data ages to transparently move it to low-cost secondary storage in the data center or in the cloud. Komprise Transparent Move Technology ensures that employees will always find their data in the same place as before, and IT can minimize rehydration costs with the ability to access data natively at the destination. TMT also means that IT can move data again, from cloud to cloud, or from cloud back to on-premises, without rehydrating it to the original storage.