Get the Flash Stretch Assessment. Maximize Tiering to Offset Price Hikes. Learn How

Archiving vs Transparent Archiving

Archiving vs Transparent Archiving

All Data Archiving is Not the Same

Data tiering and archiving is popular for its cost-saving potential. But how much you actually save varies greatly depending on the method used. Not all tiering and archiving choices are equal. But when you know what to look for, you can avoid disruption to end-users and data access, prevent vendor lock-in and maximize savings.

Read this white paper, Archiving vs. Transparent Archiving to understand the differences in archiving techniques to make the best decision for your organization and learn more about Komprise Transparent Move Technology (TMT).

Read about:

  • The 5 keys to good archiving
  • Popular archiving and tiering approaches
  • How they compare
TransparentArchiving_2020_FINAL-1

FAQs

Why do most traditional tiering approaches fail to deliver their promised savings, and what is the real cost of an approach that disrupts users?

Many vendors offer data archiving, but there’s a vast difference between solutions; some are saying that their archiving is transparent to users, apps, and workflows; but when you take a closer look at how the solution archives cold data, you discover a big difference — and lots of disruption — from data access, hidden costs, and vendor lock-in. The gap between what traditional tiering promises and what it actually delivers comes down to one behavioral reality: when users cannot access their data without filing a support ticket, they stop cooperating with tiering programs entirely:

  • Traditional tiering breaks the user experience and breaks the savings model — in this scenario, end users can literally wake up and find their data gone; because, at its most basic, tiering simply moves cold data from the primary storage onto another medium; this is a lot easier than transparent tiering, which is why so many vendors offer it, but this simplicity comes at a cost; if users need to access a cold file or run an older application that requires accessing a cold file that’s been tiered, they must file a support ticket; IT administrators do not enjoy these retrieval requests any more than users enjoy making them; both are unproductive time sinks that compound across thousands of files and hundreds of users
  • The manual approval process eliminates most of the available savings — traditional archiving requires a manual approval process between users and IT; first to gain permission, then to painstakingly go through which files can be archived, and then to repeat this process on an ongoing basis to identify cold data to offload primary storage; not only is this highly inefficient, but it results in only archiving less than 10% of the 70% cold data they have — a tremendous savings loss; paying for a tiering solution that captures less than 10% of the available savings is not a cost optimization — it is a budget line that delivers minimal return
  • The 70% cold data opportunity requires automation to capture — manual, project-based, batch tiering processes can only address the cold data IT teams know about and can get department approval to move; the continuous accumulation of cold data between approval cycles means the primary storage refills faster than manual processes can keep pace with; automated, policy-driven transparent tiering is the only approach that captures the full 70% cold data opportunity on an ongoing basis
  • User disruption is a compounding organizational cost — beyond the direct cost of IT retrieval time, user disruption from traditional tiering creates a secondary cost: departments that experience disruption stop cooperating with future tiering initiatives; the organizational resistance that builds from one bad tiering experience can set a tiering program back by years, during which cold data continues accumulating on expensive primary storage at current hardware prices
  • The Flash Stretch Assessment reveals how much available savings traditional tiering is leaving on the table — for qualified enterprises managing 500TB or more, the Komprise Flash Stretch Assessment identifies the full cold data opportunity across the storage estate and models what transparent, automated, policy-driven tiering would save versus whatever current approach is in place; for organizations with existing traditional tiering programs, this assessment consistently shows that less than 10% of available savings are being captured

What are proprietary stubs, why did they fail as a transparent tiering mechanism, and what replaced them?

Hierarchical Storage Management introduced the concept of transparent archiving decades ago using proprietary stubs — placeholder files left on primary storage that pointed to the tiered location and triggered retrieval when accessed. The idea was correct; the implementation created problems that persisted for as long as stub-based systems were in production. Understanding why stubs fail explains exactly what transparent tiering requires to work correctly:

  • Stubs are brittle by design — stubs are proprietary placeholders used after data is migrated to secondary storage, but they introduce risks such as orphaned data, latency, and limited scalability; because stub-based systems rely on static and brittle mechanisms, they can disrupt users and applications when data is moved or modified; a stub is a static pointer to a specific location; if that location changes, if the storage system is reorganized, or if a migration moves the destination, the stub breaks and the underlying data becomes orphaned with no recovery path
  • Rehydration adds latency and risk — when users access stubbed data, the storage management system intercepts the request, retrieves the data from its secondary location — whether file, object, cloud, or tape — and rehydrates it back to primary storage; while this makes tiered data appear to reside on primary storage, the transparency ends there; the retrieval and rehydration process adds latency and increases the risk of data loss or corruption; a user experience that requires seconds to retrieve a file that should open instantly is not transparent — it is noticeable disruption with a slower mechanism than filing a support ticket
  • Stubs create proprietary lock-in at the data level — stub-based tiering requires the specific tiering software to be running and available for any stubbed file to be accessible; if the tiering vendor changes licensing terms, discontinues a product, or is acquired, every stubbed file in the estate becomes inaccessible; proprietary transparent archiving is the method behind HSM, which is cumbersome and includes brittle and unreliable stubs and agents; this is the original lock-in problem that modern transparent tiering was built to solve
  • Dynamic Links replace stubs with standards-based constructs — by using standards-based symbolic links rather than static proprietary stubs, the risk of orphaned data and broken access paths is eliminated; this approach maintains transparency without introducing latency, disruption, or the brittleness associated with traditional stub-based systems; Komprise Dynamic Links are built on the industry-standard symbolic link construct available natively in both NFS and SMB file systems; they require no agents, no proprietary software, and no dependency on Komprise being available for user access to tiered files
  • If the Dynamic Link is deleted, data is not orphaned — this is the critical operational difference from stubs; because the Dynamic Link itself does not contain the context of the moved file, Komprise can repopulate it; a stub that is accidentally deleted creates an orphaned file with no recovery; a Dynamic Link that is deleted can be recreated by Komprise because the file exists at its destination independently of the link; this resilience eliminates the most dangerous operational risk that made stub-based tiering so problematic in production environments

Why is transparent tiering the only approach that delivers its full promised savings, and how does the backup footprint reduction compound those savings?

There’s no reason data tiering should present any disruption at all; cold data that’s been moved to a cheaper capacity storage should look the same to users and apps — as if it’s still on your fast, expensive primary storage; the ability for both users and applications to still access files exactly where they were before without having to rehydrate the file is possible with a true transparent tiering solution. The savings case for transparent tiering is not just about primary storage — it is about the multiplier effect across backup, DR, and migration costs that most tiering evaluations fail to model:

  • Transparency drives adoption, and adoption drives savingstransparent tiering solutions enable cold data to be tiered to cloud storage without any change to user and application access, which is key to gaining user adoption; an organization with zero user disruption from tiering can tier continuously and aggressively, capturing the full 70% cold data opportunity across the entire estate; an organization with disruptive tiering tiers cautiously and partially, capturing a fraction of available savings while managing a constant stream of retrieval complaints
  • Backup footprint reduction is the compounding multiplier — the most significant savings that most tiering evaluations undercount is the reduction in backup footprint; some solutions claim the ability to transparently archive but are unable to reduce the backup footprint and make it nearly impossible to switch primary vendors, thus eroding savings and imposing vendor lock-in; Komprise transparent tiering removes entire files from primary storage, eliminating them from backup jobs immediately; backup windows shrink, backup licensing costs fall, and DR replication costs drop as a direct byproduct of tiering; the total storage savings across primary, backup, and DR is typically 3x to 4x the primary storage savings alone
  • No rehydration for backup operations preserves the backup savings — proprietary transparent tiering via stubs requires backup software to rehydrate tiered data before it can back it up; this rehydration defeats the backup savings and generates egress fees simultaneously; Komprise transparent tiering with Dynamic Links allows backup software to see the Dynamic Link as the file it represents and skip it, treating the tiered file as already protected at its cloud destination; no rehydration, no egress fee, no backup cost for cold data
  • Transparent tiering eliminates rehydration before storage vendor migration — organizations that tier cold data with proprietary block tiering or stub-based systems face a mandatory full rehydration before they can migrate to a new storage vendor or change cloud providers; at current data volumes this is a significant capital and time cost; Komprise intelligent tiering stores files as native objects at the destination, so storage vendor migrations proceed against the active hot data only — the tiered cold data remains accessible at its destination throughout the migration without a single rehydration event
  • Customers typically reduce storage and backup costs by 50% or more — the compounding savings across primary storage, backup licensing, DR replication, and eliminated egress fees is what drives the 50 to 70% total cost reduction that Komprise customers achieve; transparent tiering that captures the full cold data opportunity, eliminates backup footprint, and removes rehydration penalties is mathematically different from approaches that achieve some storage efficiency while leaving the backup multiplier and egress costs intact

How has intelligent tiering evolved from a cost-saving discipline into an AI data preparation strategy, and why do both goals require the same transparent approach?

The white paper on archiving versus transparent archiving was written when tiering was evaluated purely as a cost management decision. The most significant evolution since is that intelligent tiering is now simultaneously the most effective storage cost optimization and the first step in making unstructured data AI-ready. Both goals require the same transparent, file-level, standards-based approach — and both goals are undermined by the same failure modes of traditional and proprietary tiering:

  • Tiered data that is AI-inaccessible is tiered data that has lost half its value — cold data tiered by block-based storage vendor tools or stub-based systems cannot be read directly by AWS SageMaker, Azure AI, Google Vertex, Snowflake, or Databricks; it is cost-optimized but AI-blind; file archiving plays an increasingly important role in AI pipelines; the Komprise Global Metadatabase and Smart Data Workflows turn archives into AI-ready datasets; tiered data that is stored as a native object by Komprise is simultaneously cost-optimized and immediately accessible to cloud AI services — both goals achieved in a single infrastructure motion
  • The Global Metadatabase indexes every file at the point of tiering — as Komprise transparent tiering moves each file, the Komprise Global Metadatabase records its new location, access history, file type, sensitivity status, and any custom metadata attributes extracted by KAPPA data services; the tiered archive is not a cost endpoint — it is a continuously enriching metadata layer that Deep Analytics can query for AI dataset identification without moving data again; this is the capability that transforms tiering from a cost exercise into an AI data foundation
  • Transparent access from the original path is what makes the archive AI-queryable — because Komprise Dynamic Links maintain the original file path, the Global Metadatabase can reference tiered files by their logical enterprise location rather than their physical storage location; AI workflows can query by business criteria — all DICOM studies for oncology patients, all genomics files from a specific research project, all contracts from a specific client — without needing to know which storage tier or cloud destination holds each file
  • Sensitive data governance applies at the point of tiering — organizations using proprietary or traditional tiering move cold data without classifying it; PHI, PII, and IP that was ungoverned on primary storage is still ungoverned at the tiering destination; Komprise Sensitive Data Management, available in Komprise Intelligent Data Management, can scan files for sensitive content before tiering and apply automatic remediation policies; the tiering operation becomes a governance event, not just a movement event, ensuring that sensitive data does not reach cloud destinations where AI tools may access it without authorization
  • Komprise is the metadata and orchestration layer for enterprise unstructured AI data; transparent tiering is the mechanism through which cold data becomes part of that orchestration layer rather than disappearing into a cost-management dead end; the same investment that reduces storage costs today builds the AI data foundation that competitive organizations will depend on going forward

What should IT teams look for when evaluating whether a tiering solution is genuinely transparent, and how do they cut through vendor claims to find the real answer?

The word transparent is used by virtually every tiering vendor regardless of whether their solution actually delivers it. Many vendors offer data archiving or data tiering approaches, but there’s a vast difference between solutions; some say that their data tiering is transparent to users, apps, and workflows; but when you take a closer look at how the solution tiers cold data, you see a big difference — and lots of disruption — in terms of what happens when tiered files get accessed, hidden costs, and vendor lock-in. The questions that reveal genuine transparency from claimed transparency:

  • Ask: what happens when a backup job runs against tiered data? — the correct answer for genuine transparent tiering is that the backup job skips the tiered file because it is handled at the destination; the answer that reveals proprietary tiering is that the backup software rehydrates tiered files before backing them up; this single question exposes whether the backup footprint savings are real or theoretical; if rehydration happens during backup, the backup savings do not exist and egress fees are generated on every backup cycle
  • Ask: what happens to tiered data if we switch storage vendors? — genuine transparent tiering stores files as native objects at the destination, requiring no rehydration before vendor migration; stub-based and block-based tiering require full rehydration before any migration can proceed; at 5PB+ data estates this distinction represents a potential multi-month migration project and significant capital cost; the vendor’s answer to this question reveals whether the transparency claim extends to the full data lifecycle or only to day-to-day user access
  • Ask: can cloud AI services read tiered data directly without going through your software? — file-level tiering ensures that the entire file is archived as an object, so it can be accessed natively in the cloud by any standard S3 tools, without having to go back to the original file system or to the data management software itself; any tiering vendor whose answer requires routing through their own software or the source storage OS for cloud AI access is not delivering true transparency at the destination
  • Ask: what happens if the tiered file’s Dynamic Link or stub is accidentally deleted? — this question separates stub-based solutions from Dynamic Link-based solutions; a deleted stub creates an orphaned file with no path to recovery; a deleted Komprise Dynamic Link can be repopulated by Komprise because the file exists independently at its destination; this resilience distinction is not theoretical — accidental deletion of placeholder files happens in production environments, and the recovery story determines whether the organization can retrieve that data
  • Run the Flash Stretch Assessment before committing to any approach — for qualified enterprises managing 500TB or more, the Komprise Flash Stretch Assessment analyzes the current tiering approach if one exists, models the true savings being achieved versus the theoretical maximum, and identifies what transparent tiering would deliver in additional primary storage, backup, and DR cost reduction; this assessment turns the transparency evaluation from a feature checklist into a quantified financial comparison that reveals exactly what the current approach is costing in foregone savings