Get the Flash Stretch Assessment. Maximize Tiering to Offset Price Hikes. Learn How

Back

Rehydration

What is rehydration?

Rehydration is the process to fully reconstitute files so the transferred data can be accessed and used. Block-level tiering requires rehydrating tiered archived data before it can be used, migrated or backed up. No rehydration is needed with Komprise, which uses file-based tiering.

Rehydration and the Cloud

In this post, Komprise CEO Kumar Goswami answers the question: “Will I lose storage efficiencies such as de-dupe by not using a storage tiering solution in the cloud?” He notes:

Komprise_CloudTieringPool_blogsocial2Block-tiering-vs-file-tiering-Oct-2019-FINALv21024_1The overhead of keeping blocks in the cloud due to high egress costs, high data rehydration costs and high defragmentation costs significantly overshadows any potential de-dupe savings. When data is moved at the block level to the cloud, you are really not saving on any third-party backups and other applications because block tiering is a proprietary solution – read this white paper for more background on block-level vs file-based data tiering and cloud tiering. So if you consider all the additional backup licensing costs, cloud egress costs, cloud retrieval costs plus the fact that you are now locked-in and have to pay file system costs forever in the cloud to access your data (learn more about the benefits of cloud native unstructured data access), then the small savings you may get from dedupe are significantly overshadowed by overall costs and the loss of flexibility.

Komprise provides a custom data rehydration policy that the user can configure to meet their needs. Data need not be re-hydrated on the first access. Komprise also provides a bulk recall feature if needed. Learn more about file-based cloud tiering with Komprise.

What is the rehydration penalty?

The rehydration penalty occurs when tiered or archived data must be fully copied back (“rehydrated”) into expensive primary storage before it can be accessed by users or applications. This process consumes bandwidth, increases storage costs, and delays access. Komprise avoids rehydration penalties by keeping data in its native format on the target storage, enabling transparent access without requiring recall into primary storage. Read the Komprise Data Experience paper for more information.

kdxwp_linkedinsocial1200x628

Why does rehydration create vendor lock-in and what are the true costs?

Rehydration is not just a performance problem. It is a financial and strategic trap. When storage-based tiering solutions move data using proprietary stub files or block-level formats, the data cannot be read, moved, or accessed by any system other than the originating vendor’s hardware. To use the data, you must first rehydrate it back to the original storage, paying egress fees to bring it out of the cloud, bandwidth costs to move it, and potentially additional licensing fees to the storage vendor for the recall operation.

The true cost of rehydration includes cloud egress fees charged per gigabyte retrieved, storage vendor retrieval fees for data held in proprietary formats, the cost of bandwidth consumed by bulk recall operations, the cost of the additional primary storage capacity needed to hold rehydrated data, and the ongoing file system licensing fees required to maintain access to vendor-proprietary tiered data in the cloud. For organizations considering changing storage vendors, rehydration forces them to recall all tiered data before migrating, effectively making the total cost of switching prohibitively high. This is by design. Komprise eliminates this entirely by never using stubs or proprietary formats. Data tiered via Transparent Move Technology is stored in its native format on open, standards-based object storage, accessible by any authorized system without any vendor dependency.

How does eliminating rehydration benefit AI data pipelines?

AI models, RAG pipelines, and analytics platforms need to read data directly from storage. When tiered data requires rehydration before it can be accessed, AI workflows face two problems.

  • First, the rehydration step introduces latency that can range from minutes to hours depending on the volume of data and the storage tier it occupies, making tiered data practically inaccessible for real-time or scheduled AI workflows.
  • Second, the rehydration process copies data back to expensive primary storage, increasing costs every time an AI pipeline needs to access historical or reference data that has been tiered.

Komprise solves both problems. Because data is always stored in its native format with no proprietary wrapping, AI pipelines can access tiered data directly from object storage via Dynamic Links, with no rehydration step and no delay. A research file tiered to AWS S3 three years ago is as directly accessible to an AI ingestion workflow today as it was on the day it was tiered. This makes the entire unstructured data estate, not just recently created hot data, available as a potential AI data source without any cost or latency penalty for accessing tiered content.

What is the difference between Komprise Dynamic Links and traditional stub files?

Traditional tiering solutions replace a tiered file on primary storage with a stub file, a small placeholder that points to the data’s new location. When a user or application accesses the stub, the original data is recalled from its tiered location and copied back to primary storage before the request is fulfilled. This recall is the rehydration process. Stubs are typically proprietary to the storage vendor, meaning only that vendor’s software can read and recall the data correctly.

Komprise Dynamic Links work differently. When Komprise tiers a file, it leaves a Dynamic Link at the original path that transparently redirects access requests to the file’s actual location on object storage. The key difference is that the data is served directly from object storage in its native format without any recall or copy back to primary storage. The access is seamless for users and applications. There is no bandwidth cost, no delay, and no rehydration charge. The file remains in its native format at its tiered location and is accessed as if it were still on primary storage. If the organization ever stops using Komprise, the data is still fully accessible because it was never wrapped in a proprietary format.

How does Komprise handle situations where data genuinely needs to be recalled to primary storage?

While Komprise eliminates the need for rehydration in most access scenarios through Dynamic Links, there are situations where an organization may want to recall tiered data back to primary storage, such as during a storage vendor migration, a policy change, or a bulk workload that benefits from local access. Komprise provides a configurable data recall policy and a bulk recall feature specifically for these situations.

Administrators can configure a custom rehydration policy that defines when tiered data should be recalled, for example after a defined number of accesses within a time window, when a specific application triggers recall, or on a scheduled basis for specific directories. The bulk recall feature allows IT teams to recall large volumes of data in a controlled, scheduled way without disrupting production access. These recall operations are tracked and reportable, giving IT full visibility into what has been recalled, when, and why. This flexibility means Komprise customers can benefit from zero-rehydration transparent access in day-to-day operations while retaining full control over recall when their specific use case requires it.

Want To Learn More?

Related Terms

Getting Started with Komprise: