Under the Hood: Failover and Failback in Komprise Asynchronous Data Replication

Last Tuesday, Pure Storage announced key updates to its Purity software for FlashBlade® and FlashArray™, including that Komprise has partnered with Pure to provide Asynchronous Replication for reliable data replication for Pure FlashArray™ file customers.

Last week, I popped open the hood to explain how Komprise Asynchronous Replication works. I explained how Komprise ensures that a point-in-time copy is always safe and available, and how Komprise handles errors, failures, and schedule overruns during replications.

Now, let’s examine what happens if the worst case happens… and how Komprise can help you manage either a temporary or permanent loss of availability of a FlashArray Files system.

Handling Failover

Let’s say that you’re using Komprise Asynchronous Replication to protect your FlashArray Files systems. Now, assume something happens to one those FlashArrays, rendering it unavailable.

The first order of business will be to ensure continued business operations, by providing your end users and applications access to the data that had resided on the affected source FlashArray. This means failing over the end users and applications to the recovery copy, which is the latest replication copy for the unavailable FlashArray, located on the replication destination FlashArray.

(Note: Any changes that were made to the data on the replication source after the recovery copy snapshot was taken, up until the time that the source became unavailable, will not be included in the recovery copy on the replication destination. The potential number of changes lost in the failover process will depend on the replication’s schedule frequency, which essentially defines your RPO – recovery point objective.)

Once your end users and applications have been transitioned to the recovery copy, they can continue working, using the recovery copy as their primary file system. Next, you’ll log into your Komprise Director console and start “Failover” for the replication of the unavailable FlashArray. This puts the replication into the Failover state: Komprise will stop all future copy activity, since the replication source FlashArray is no longer available. In fact, you’d see an error displayed in your Komprise Director console informing you that the FlashArray are inaccessible.

During failover, your end users and applications will be creating, modifying, and deleting files and directories on the recovery copy in the course of their daily work. If the source FlashArray cannot be recovered, then the destination FlashArray Files will need to serve as the live file system for your end users and applications. Since that FlashArray was likely located geographically distant from the original source, there may be increased latency experienced by your end users. If this access latency becomes intolerable, you can use Komprise’s Elastic Data Migration capability to migrate the data from the recovery copy to a new FlashArray Files more conveniently located to your end users and applications.

Whether the previous replication destination or a new FlashArray Files becomes the replacement file server, you’ll next want to protect the data on the replacement server by configuring a new replication in Komprise with that file server as the replication source. You’ll accomplish this in the Komprise Director console, just as you had set-up the replication of the original FlashArray Files source.

Handling Failback

In a happier case, if the unavailable replication source is restored to operational condition, you can perform a failback process, where the changes that have accumulated on the recovery copy are copied back to the restored FlashArray, to enable its use again.

Failback: Copy is being made back to the source.

To perform failback, select “Failback” in the Komprise Director for each desired replication. Komprise will automatically configure the failback runs, to copy data back from the recovery copy to the restored source FlashArray. You may need to perform multiple failback runs to ensure that all changes that users made during failover are copied back to the source FlashArray—especially if your end users and applications continue to use the recovery copy during failback. It is recommended that a “final failback run” be performed while the recovery copy is set as read-only for your end users and applications, to copy over all last changes. This will ensure your recovered FlashArray will have all the latest data from the recovery copy.

(Note: Performing failback will write data on the source FlashArray Files from the recovery copy. So, make sure that no data on the recovered FlashArrays is required, as it may be overwritten.)

Finally, select “Complete failback” in the Komprise Director. This will signal Komprise to automatically restart replications of the source FlashArray Files, just as was occurring prior to the source becoming unavailable, reinstating protection for it.

You can then transition your end users and applications back to the restored FlashArray, where they can find all their data and resume business as usual.

 

Learn more about Komprise for Pure Storage here.

 

Getting Started with Komprise:

Contact | Demo