Multisite Data Replication Over a WAN for Disaster Recovery

Posted onOctober 18, 2023

LINBIT® and many other organizations that specialize in high-availability (HA) solutions use and support the cluster manager Pacemaker. Pacemaker, when properly configured, does a great job at maintaining high availability for services, applications, and other resources within a single-site cluster on a LAN. However, as robust a solution as Pacemaker is, it was never designed to operate across a WAN, or any high latency networks. That is where some additional tools, for example, Booth, DRBD®, and DRBD Proxy, can complete an HA solution and offer true disaster recovery (DR) for your services and for the data on which they might rely.

The Need For a Multisite Disaster Recovery Solution

While single-site high availability might be enough in some cases, there might come a time where the potential loss of revenue or customer confidence in the services that you offer might justify the expense and effort of setting up a DR solution. Having a properly set up multisite DR solution (sometimes called geo-clustering) will mean that even if a single site that hosts your services and data goes down, a second site is ready to take over the hosting role.

DRBD & DRBD Proxy

DRBD is high performance data replication software that allows for HA data in a single-site Pacemaker cluster. However, like Pacemaker, it was never designed to operate across a WAN. That is where DRBD Proxy comes in. DRBD Proxy is LINBIT-developed software that provides the compression and cache operations that compliment DRBD data replication and allow for a true DR solution across a WAN.

The Booth Ticket Manager

To overcome the issue of Pacemaker not being able to orchestrate failovers between data centers and across long distances, the booth add-on for Pacemaker was conceived back in late 2011. LINBIT has been involved in the development of booth since 2013, and has been offering it as a supported solution since 2015.

Have Your Ticket Ready

Booth addresses the shortcomings of Pacemaker by introducing the concept of “tickets”. Booth constrains Pacemaker’s ability to start particular resources by issuing or revoking tickets. Only on the site which holds a valid booth ticket can Pacemaker start constrained resources. This can be thought of as being similar to the token ring networks of days past. If a site loses communication with the rest of the booth cluster its ticket will not renew and Pacemaker will stop resources within the expected time frame. For booth to ensure that there is no cluster split, and two sites never have the ticket at the same time, you should configure an arbitrator node to achieve quorum, and set an expiration period on the tickets.

The Booth Arbitrator Node

The arbitrator node does not have to have the same specifications as the other nodes that host data or services in your clusters. Its only role is to help achieve quorum in your DR solution. It only needs to run the booth arbitrator software and have a WAN connection to your other two sites. You can either use a minimally set up physical machine if you have the luxury of a third site, or else a virtual machine instance in a public cloud if you do not.

Redirecting Service Traffic To a Failover Site

While running Pacemaker with booth addresses the issues of high availability across a WAN for disaster recovery, one issue which has always proven difficult is redirecting client traffic to the new site.

Past demonstrations of booth have simply used a round-robin DNS (such as in my Booth Geo Cluster Demo demonstration). While round-robin DNS is easy to configure and simple, it is inefficient because every other request is discarded. Plenty of other specialty options exist such as a software load balancer, for example, HAProxy, or else a hardware load balancing appliance, or a dynamic DNS update type solution, for example, Route 53, DynDNS, and others.

Fortunately, Pacemaker allows you to use a virtual IP address resource to offer a single IP address, through which your resources can be reached, regardless of which cluster node hosts the resources.

Setting Up a Multisite Disaster Recovery Solution

To guide you through setting up the solution in this article, you can download the Geo-Clustering with DRBD 9 and DRBD Proxy in RHEL 8 technical guide. This guide describes, step-by-step, how to configure a disaster recovery solution by using Red Hat Enterprise Linux (RHEL) 8, Pacemaker, Booth, and DRBD Proxy for data replication, to offer a highly available, multisite service. For an example use case, the guide uses a MariaDB service.

A Video Overview & DRBD Proxy Video Demonstration

For a brief but highly detailed overview of this multisite HA and DR solution, and an explanation of the components used, beyond what is covered in this article, check out the Geo-Clustering with Pacemaker & DRBD Proxy video on the LINBIT YouTube channel. For a demonstration of DRBD Proxy replicating data between two sites, check out the Linux Disaster Recovery Replication with DRBD Proxy demonstration video, also on the LINBIT YouTube channel.

📝 IMPORTANT: DRBD Proxy is one of the few parts of the LINBIT software family that is not published under an open source license. For a free evaluation license, if you are interested in this solution, contact LINBIT sales.

Share this post

More to Explore

Devin Vance

First introduced to Linux back in 1996, and using Linux almost exclusively by 2005, Devin has years of Linux administration and systems engineering under his belt. He has been deploying and improving clusters with LINBIT since 2011. When not at the keyboard, you can usually find Devin wrenching on an American motorcycle or down at one of the local bowling alleys.

Talk to us

First name

Last name

Company name

Country

Message

I agree to receive other communications from LINBIT.*

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

First name

Last name

Company name

Country

Message

I agree to receive other communications from LINBIT.*

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Software-Defined Storage

High Availability

Disaster Recovery

Further Solutions

Guides, Manuals, & Training

From Our Community

Knowledge Base

Company

Partners

Events

ControlIT

Multisite Data Replication Over a WAN for Disaster Recovery

The Need For a Multisite Disaster Recovery Solution

DRBD & DRBD Proxy

The Booth Ticket Manager

Have Your Ticket Ready

The Booth Arbitrator Node

Redirecting Service Traffic To a Failover Site

Setting Up a Multisite Disaster Recovery Solution

A Video Overview & DRBD Proxy Video Demonstration

Recent Posts

Recent Posts

More to Explore

Devin Vance

Talk to us

Talk to us

Legal

Resources

Company