Load Balanced Replication with DRBD

Along with the ability to encrypt replication traffic, DRBD® version 9.2.6 gives you the ability to load balance replication traffic across multiple TCP sockets1. You can see a demo of the load balancing feature from LINBIT® founder and CEO Philipp Reisner during one of our community meetings.

DRBD version 9 has supported replicating data over multiple paths in an active-backup-like manner since its inception. With multiple paths configured, users can ensure that any single point of failure (SPOF) in a network path is mitigated by adding a second separate path for DRBD to use should the other fail. So it was only a matter of time before DRBD’s developers added the ability to load balance across multiple paths.

Load balancing DRBD’s replication traffic allows DRBD to use multiple network paths in unison for replicating writes. This allows users of a single network path to add more throughput if they find themselves approaching the limits of a single path. This might also be interesting for cloud users where bandwidth between virtual machine instances is more confined or flow controlled.

How To Enable DRBD TCP Load Balancing

Enabling TCP load balancing in DRBD is simple. If you’re a LINBIT customer, an apt or dnf update to DRBD version 9.2.6 and DRBD user space utilities (drbd-utils) version 9.26.0 or newer is all that is needed to get the supported versions installed. If you’re not a LINBIT customer, you can download and compile prepared source tar files from LINBIT’s website.

With supported versions of DRBD and DRBD utilities installed, you need only configure multiple paths and enable load-balance-paths in the network settings of the DRBD configuration.

The following is an example configuration with only the meaningful settings shown:

resource "drbd-lb-0"
{
[...]
    net
    {
        load-balance-paths      yes;
        [...]
    }

    on "linbit-0"
    {
        volume 0
        {
        [...]
        }
        node-id    0;
    }

    on "linbit-1"
    {
        volume 0
        {
        [...]
        }
        node-id    1;
    }

    on "linbit-2"
    {
        volume 0
        {
        [...]
        }
        node-id    2;
    }

    connection
    {
        path
        {
            host "linbit-0" address ipv4 192.168.220.60:7900;
            host "linbit-1" address ipv4 192.168.220.61:7900;
        }
        path
        {
            host "linbit-0" address ipv4 192.168.221.60:7900;
            host "linbit-1" address ipv4 192.168.221.61:7900;
        }
    }

    connection
    {
        path
        {
            host "linbit-0" address ipv4 192.168.220.60:7900;
            host "linbit-2" address ipv4 192.168.220.62:7900;
        }
        path
        {
            host "linbit-0" address ipv4 192.168.221.60:7900;
            host "linbit-2" address ipv4 192.168.221.62:7900;
        }
    }
        connection
    {
        path
        {
            host "linbit-1" address ipv4 192.168.220.61:7900;
            host "linbit-2" address ipv4 192.168.220.62:7900;
        }
        path
        {
            host "linbit-1" address ipv4 192.168.221.61:7900;
            host "linbit-2" address ipv4 192.168.221.62:7900;
        }
    }
}

With such configurations in place, you’re only a drbdadm up drbd-lb-0 away from balancing DRBD’s replication traffic across multiple paths.

❗ IMPORTANT: It is currently not possible to use drbdadm adjust <resource> to toggle load balancing on an active DRBD device. If you’re adding load balancing to an existing DRBD device, the device will need to be brought down and then up using drbdadm down <resource> and drbdadm up <resource>.

Testing and Verifying TCP Load Balancing

Load balancing can be verified by inspecting network interface utilization while steadily writing to a TCP load balanced DRBD device. There are quite a few utilities you could use to do this, but iptraf-ngtcpdump, or even something as simple as ss are very common.

While inspecting the output from the command of your choosing, simply unplug one of the network paths. You should see that DRBD becomes disconnected, and traffic across both the load balanced paths is reduced to DRBD’s periodic rechecking of the links. Once DRBD identifies which network paths are healthy, it will reestablish load balanced replication over the working paths. After reconnecting, a small resynchronization of any data that changed while the peers were disconnected will occur in the background to ensure the devices are UpToDate. If an unhealthy path becomes healthy, DRBD will detect this automatically, and reconnect the peer connections to include the newly repaired path.

💡 TIP: If you’re only interested in resilience, leaving load-balance-paths disabled will result in DRBD only using one path at a time instead of load balancing replicated writes across all configured paths.

Verifying Load Balancing Using IPTraf-ng

You can start verifying DRBD load balancing by using the most visual of the three networking traffic inspection tools that my brain conjured up: the iptraf-ng utility. After opening iptraf-ng on the DRBD primary node, create an IP filter that filters for TCP traffic destined for port 7900 (the port used in the example configuration above).

With such a filter applied, run an IP traffic monitor on all interfaces and verify that packets are accumulating on both of the configured paths at a comparable rate.

Verifying Load Balancing Using tcpdump

A go to for many system administrators, tcpdump, can also be used to verify TCP load balancing for DRBD. On any of the DRBD secondary nodes, filter for TCP packets destined for port 7900 on any interface. You should see the stream of replicated writes coming from the DRBD primary node.

# tcpdump -n -c 10 -i any tcp port 7900
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
23:38:40.592588 eth2  Out IP 192.168.220.61.7900 > 192.168.220.60.45677: Flags [P.], seq 4088704108:4088704140, ack 604110629, win 501, options [nop,nop,TS val 1829930048 ecr 3540804965], length 32
23:38:40.592751 eth2  In  IP 192.168.220.60.45677 > 192.168.220.61.7900: Flags [.], ack 32, win 3106, options [nop,nop,TS val 3540804969 ecr 1829930044], length 0
23:38:40.596986 eth2  In  IP 192.168.220.60.7900 > 192.168.220.61.38539: Flags [.], seq 2108668342:2108668390, ack 648750293, win 501, options [nop,nop,TS val 3540804973 ecr 1829930043], length 48
23:38:40.597031 eth1  In  IP 192.168.221.60.7900 > 192.168.221.61.36487: Flags [P.], seq 3295614837:3295617733, ack 2514536037, win 509, options [nop,nop,TS val 1980714571 ecr 4293346714], length 2896
23:38:40.597031 eth1  In  IP 192.168.221.60.7900 > 192.168.221.61.36487: Flags [P.], seq 2896:3080, ack 1, win 509, options [nop,nop,TS val 1980714571 ecr 4293346714], length 184
23:38:40.597043 eth1  Out IP 192.168.221.61.36487 > 192.168.221.60.7900: Flags [.], ack 2896, win 24560, options [nop,nop,TS val 4293346725 ecr 1980714571], length 0
23:38:40.597080 eth2  Out IP 192.168.220.61.38539 > 192.168.220.60.7900: Flags [.], ack 48, win 24576, options [nop,nop,TS val 1829930052 ecr 3540804973], length 0
23:38:40.597092 eth1  Out IP 192.168.221.61.36487 > 192.168.221.60.7900: Flags [.], ack 3080, win 24576, options [nop,nop,TS val 4293346725 ecr 1980714571], length 0
23:38:40.602247 eth2  In  IP 192.168.220.60.45677 > 192.168.220.61.7900: Flags [P.], seq 1:41, ack 32, win 3106, options [nop,nop,TS val 3540804978 ecr 1829930044], length 40
23:38:40.602258 eth2  Out IP 192.168.220.61.7900 > 192.168.220.60.45677: Flags [.], ack 41, win 501, options [nop,nop,TS val 1829930058 ecr 3540804978], length 0
10 packets captured
17 packets received by filter
0 packets dropped by kernel

TIP: The -n flag tells tcpdump to not convert IP addresses to names, the -c flag specifies how many packets tcpdump should capture before exiting, the -i flag specifies the interface to listen on, and tcp port 7900 is the tcpdump filter expression.

Again, you should see a nearly equal amount of traffic on TCP port 7900 traveling over both paths.

Verifying Load Balancing Using ss

Lastly, but the first utility I personally use for inspecting open network sockets on a system, the ss utility can be used to verify that there are established TCP sessions on all of the configured paths’ sockets.

# ss -tna dst :7900
State   Recv-Q   Send-Q       Local Address:Port          Peer Address:Port   Process   
ESTAB   0        0           192.168.221.61:36487       192.168.221.60:7900             
ESTAB   0        0           192.168.220.61:39223       192.168.220.62:7900             
ESTAB   0        0           192.168.220.61:38539       192.168.220.60:7900             
ESTAB   0        0           192.168.221.61:35837       192.168.221.62:7900

TIP: The -t flag inspects only TCP sessions, the -n flag instructs ss not to resolve service names, the -a flag displays both listening and established connections, and dst :7900 is the filter expression.

While you cannot see active traffic using ss – unless you happen to catch some data in a send or receive buffer – you will be able to see established (ESTAB) TCP sessions, which is enough to verify you’re using both paths configured for DRBD.

Conclusion

DRBD 9.2.6. is an exciting release for the new features that it brings, such as Encryption using kTLS and load balancing. Adding two major features, all in one release, is quite an update for DRBD. Hopefully this article has shown you how you can get started using the load balancing feature to increase DRBD data replication performance for your highly available storage needs. Be sure to test this out for yourself and let us know what you think!


1. At the time of writing this blog, it is not possible to use both kTLS encrpytion and TCP load balancing in the same DRBD configuration.

Matt Kereczman

Matt Kereczman

Matt Kereczman is a Solutions Architect at LINBIT with a long history of Linux System Administration and Linux System Engineering. Matt is a cornerstone in LINBIT's technical team, and plays an important role in making LINBIT and LINBIT's customer's solutions great. Matt was President of the GNU/Linux Club at Northampton Area Community College prior to graduating with Honors from Pennsylvania College of Technology with a BS in Information Security. Open Source Software and Hardware are at the core of most of Matt's hobbies.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.