In case you missed the news, on Tuesday, Pure Storage announced key updates to its Purity software for FlashBlade® and FlashArray™, including that Komprise has partnered with Pure to provide Asynchronous Replication for reliable data replication for Pure FlashArray™ file customers.
Let’s get into the technical details of Komprise Asynchronous Replication.
Asynchronous Replication Overview
Komprise provides asynchronous replication to enable FlashArray file users to protect data by periodically copying it from a source to a destination. (For full protection, it’s recommended that the source and destination file servers should be physically disparate—and even better, geographically distant.)
Replication uses snapshots on a source to create a point-in-time copy on a destination. Using the snapshot, Komprise copies all directories, files, and links—including Pure managed directories—from a source to a destination, on a user-configured schedule. Asynchronous replication is run by the Komprise Elastic Data Migration engine, which provides automated, high performance data transfers, with data integrity checked at each step, and all attributes, permissions, and access controls from the source applied at the destination.
Starting a Replication
Configuring a replication in Komprise is simple: you select the source to be protected and the destination for the replication copies, you specify the schedule, start time, and a name, and then press Start. Komprise will start replicating files at its first scheduled run time by taking a snapshot of the source share, then copying the snapshot to the destination.
In the first replication run, all the directories and files will be copied from the source to the destination. All subsequent runs will only copy files that have changed since they were last copied to the destination. If files or directories are deleted on the source, they will be deleted on the destination in the next replication run.
Ensuring Recovery Copy Availability
To ensure that a safe, consistent replication copy always exists on the destination storage, two shares on the destination—called Copy A and Copy B—are used in the replication process. The first replication run will copy the snapshot to Copy A. If all the data are copied correctly to the destination, the replication run will be marked as Succeeded and Copy A will contain this copy, called the “recovery copy.”
In the second replication run, the source snapshot will be copied to the destination – this time to Copy B. If that run succeeds, then Copy B will hold the recovery copy, and the share for Copy A will be the working copy for the next replication run.
Thereafter, Komprise will periodically perform replication runs based on the configured schedule. When the recovery copy is on Copy A, then Copy B will be the working copy, and vice versa. This ensures that the replication process never disturbs the availability of the safe, consistent recovery copy.
Opening the Hood: view the details of a specific replication.
Handling Errors and Failures
During a replication run, failures could disrupt the copy process. For example, the source or destination storage could have issues limiting availability; the destination could run out of storage capacity; the network could suffer brownouts; or a Komprise component could become unavailable.
All such failures generally lead to errors in copying the data, if a replication run is in progress. Komprise provides resiliency against transient failures and issues by automatically retrying individual file errors, then collecting each replication run’s set of failures and retrying them again. These represent Komprise’s best effort to successfully replicate the source snapshot on the destination.
If after all the automatic retries there remains any data that could not be copied correctly to the destination, then the replication run will be marked as Failed. Any existing current recovery copy will remain undisturbed, and the next replication run will attempt to replicate a new snapshot into the same destination share as was used on the previous, failed run.
Handling Schedule Overruns
In some situations, a replication run may still be in progress when the next scheduled run is due to start. In such cases, Komprise will allow the in-progress replication to continue, thereby overrunning the next scheduled start time. The current run will proceed, ideally to successful completion. As soon as it is finished, the next replication run will start automatically, rather than waiting until the next scheduled start time. In this way, Komprise attempts to “catch up” to the configured schedule.
The rationale for allowing an in-progress run to overrun the next scheduled start time is that if a replication run is running long, it might be due to issues that would also affect the next replication run. For example, there may be too much data on the source to copy it during the configured frequency, or files that failed to copy in the current run would continue to fail in a future run. So rather than stopping the current run and starting the next scheduled run, only to have that run also run long, Komprise believes the best course of action is to allow the current replication run to proceed to its conclusion.
Coming Up: Failover and Failback
Watch this space for the next blog post, in which we’ll cover how Komprise provides failover and failback in the event that a replication source experiences an outage and becomes unavailable to your end users and applications.
Learn more about Komprise for Pure Storage.