Data Management Glossary
What is Storage Tiering?
Storage Tiering refers to a technique of moving less frequently used data, also known as cold data, from higher performance storage such as SSD to cheaper levels of storage or tiers such as cloud or spinning disk. The term “storage tiering” arose from moving data around different tiers or classes of storage within a storage system, but has expanded now to mean tiering or archiving data from a storage system to other clouds and storage systems. Storage tiering is now considered a core feature of modern storage systems and recently has become part of default configuration for next generation storage like AWS FSx ONTAP.
Storage-agnostic data management and data tiering have emerged as more and more enterprise organizations adopt hybrid, multi-cloud, and edge IT infrastructure strategies. See also cloud tiering and choices for cloud data tiering.
Storage Tiering Cuts Costs Because 70%+ of Data is Cold
As data grows, data storage costs grow. It is easy to think the solution is more efficient storage. Or simply buy more storage. But data management is the real solation. Typically over 70% of data is cold and has not been accessed in months, yet it sits on expensive storage hardware or cloud infrastructure and consumes the same backup resources as hot data. As a result, data storage costs are rising, backup times are slowing, disaster recovery (DR) is unreliable, and the sheer bulk of this data makes it difficult to leverage newer options like Flash and Cloud.
Data Tiering Was Initially Used within a Storage Array
Data Tiering was initially a technique used by storage systems to reduce the cost of data storage by tiering cold data within the storage array to cheaper but less performant options – for example, moving data that has not been touched in a year or more from an expensive Flash tier to a low-cost SATA disk tier.
Typical storage tiers within a storage array or on-premises storage device include:
- Flash or SSD: A high-performance storage class but also very expensive. Flash is usually used on smaller data sets that are being actively used and require the highest performance.
- SATA Disks: High-capacity disks with lower performance that offer better price per GB vs SSD.
- Secondary Storage, often Object Storage: Usually a good choice for capacity storage – to store large volumes of cool data that is not as frequently accessed, at a much lower cost.
Increasingly, enterprise IT organization are looking at another option – tiering or archiving data to a public cloud.
- Public Cloud Storage: Public clouds currently have a mix of object and file storage options. The object storage classes such as Amazon S3 and Azure Blob (Azure Storage) provide tremendous cost efficiency and all the benefits of object storage without the headaches of setup and management.
- Cloud NAS has also become increasingly popular, but if unstructured data is not well managed, data storage costs will be prohibitive.
Cloud Storage Tiering is now Popular
Tiering and archiving less frequently used data or cold data to public cloud storage classes is now more popular. This is because customers can leverage the lower cost storage classes within the cloud to keep the cold data and promote them to the higher cost storage classes when needed. For example, data can be archived or tiered from on-premises NAS to Amazon S3 Infrequent Access or Amazon Glacier for low ongoing costs, and then promoted to Amazon EFS or FSX when you want to operate on it and need performance.
Cloud isn’t just low-cost data storage
The cloud offers more than low-cost data storage. Advanced security features such immutable storage that can defeat ransomware. Cloud native services from analytics to machine learning can drive value from your unstructured data.
But in order to take advantage of these capabilities, and to ensure you’re not treating the cloud as just a cheap storage locker, data that is tiered to the cloud needs to be accessible natively in the cloud without requiring third-party software. This requires the right approach to storage tiering, which is file-tiering, not block-tiering.
Block Tiering Creates Unnecessary Costs and Lock-In
Block-level storage tiering was first introduced as a technique within a storage array to make the storage box more efficient by leveraging a mix of technologies such as more expensive SSD disks as well as cheaper SATA disks.
Block storage tiering breaks a file into various blocks – metadata blocks that contain information about the file, and data blocks that are chunks of the original file. Block-tiering or Block-level tiering moves less used cold blocks to lower, less expensive tiers, while hot blocks and metadata are typically retained in the higher, faster, and more expensive storage tiers.
Block tiering is a technique used within the storage operating system or filesystem and is proprietary. Storage vendors offer block tiering as a way to reduce the cost of their storage environment. Many storage vendors are now expanding block tiering to move data to the public cloud or on-premises object storage.
But, since block storage tiering (often called CloudPools – examples are NetApp FabricPool and Dell EMC Isilon CloudPools) is done inside the storage operating system as a proprietary solution, it has several limitations when it comes to efficiency of reuse and efficiency of storage savings. Firstly, with block tiering, the proprietary storage filesystem must be involved in all data access since it retains the metadata and has the “map” to putting the file together from the various blocks. This also means that the cold blocks that are moved to a lower tier or the cloud cannot be directly accessed from the new location without involving the proprietary filesystem because the cloud does not have the metadata map and the other data blocks and the file context and attributes to put the file together. So, block tiering is a proprietary approach that often results in unnecessary rehydration of the data and treats the cloud as a cheap storage locker rather than as a powerful way to use data when needed.
With block storage tiering, the only way to access data in the cloud is to run the proprietary storage file system in the cloud which adds to costs. Also, many third-party applications such as backup software that operate at a file level require the cold blocks to be brought back or rehydrated, which defeats the purpose of tiering to a lower cost storage and erodes the potential savings. For more details, read the white paper: Block vs. File-Level Tiering and Archiving.