High-Availability with DRBD Reactor & Promoter

In this blog post, we discuss the DRBD Reactor® and Promoter plug-in. Please read the overview blog post first if you’re looking for a general introduction to DRBD Reactor.

This DRBD Reactor & Promoter post first appeared in February 2022. We have updated it for technical accuracy and clarity, referencing a new chapter in the DRBD User’s Guide.

The Motivation to Develop Another Cluster Manager

With many years of experience in high-availability (HA), we at LINBIT® noticed that the typical Pacemaker stack in such HA scenarios introduces great complexity. We must deal with misconfigured Pacemaker setups in many customer support tickets. This occurrence is neither the fault of our customers nor Pacemaker’s fault. HA is a complex topic, and Pacemaker is a very flexible tool. With this flexibility and the separation of the whole Pacemaker stack in a seemingly endless amount of components (Corosync/knet, LRMd, PENGINE, STONITH, CRMd, CIB, crmsh, pcs, and others) the overall complexity can be overwhelming.

The typical HA stack is also hard to build. It takes days or even weeks until the complete stack builds and is tested on all the distributions and architectures we support.

So our goals are as follows:

  • Reduce complexity and keep things simple.
  • Shift responsibility to well-known and tested components. Don’t reinvent the wheel.
  • Keep configuration simple.
  • Keep the number of components and the interaction between them low.
  • Finally, make it easy to build.

I want to stress that Pacemaker has been an excellent component of the Linux HA stack for years and is more flexible than DRBD Reactor and its Promoter plug-in. So I don’t see it as a rival at all. We think there are scenarios (99% of the time) where the benefits of a simple DRBD Reactor setup could be a better tool for HA use cases.

Implementing High-Availability by Using DRBD Reactor

DRBD Reactor is wholly tied to HA storage that uses DRBD®. So it is not as generic as Pacemaker in that sense. However, the advantage is that DRBD Reactor can use DRBD states (quorum and “may promote”) as its decision source. And actually, DRBD Reactor only needs these two pieces of information:

  • If the DRBD resource can be promoted (that is, the resource is not active anywhere else and has quorum), then promote the DRBD resource and start the user-defined services.
  • If quorum is lost, stop the services, demote the DRBD resource, and give another node a chance to promote the resource and start services. Finally, if configured and demotion fails, execute some action, for example, poweroff the node, reboot the node, or do other activities.

The first consequence of this design is that DRBD Reactor does not need any further cluster communication besides listening to the events above. There is no shared Cluster Information Base (CIB) for cluster configuration that needs synchronization as we know it from the Pacemaker stack. The disadvantage is that configuration files need to be copied to all nodes. Overall the benefits of keeping the daemon simple are more critical. Usually, higher-level tools distribute the configuration in the cluster, for example, LINSTOR Gateway.

If one or multiple nodes detect that they can promote the DRBD resource, they all race to promote it to Primary. Only one can do that; when that happens, the other nodes back off. As an implementation detail: Before a node tries to become Primary, it might sleep depending on its local resource state. With that, we allow a node with a good state to win the race for DRBD promotion.

DRBD Reactor & Promoter Keeps Things Simple

Using DRBD quorum keeps the Promoter plug-in simple by using a component that already implements quorum (DRBD). So we know when to promote a DRBD resource and start services that depend on it, but how do we start services? Again, simplicity is key. There is already a standard system component that allows us to start and stop services and even group services into “targets”: systemd. One could even think of the Promoter plug-in as a very elaborate systemd template generator. For more technical information and details of systemd template generation, see this in-depth documentation. A high-level overview will help you understand how the Promoter plug-in works. Let’s assume we want to start the services a.service, b.service, and c.service.

  • The plug-in generates an override for the [email protected] included with the drbd-utils package. This ensures that the resource gets promoted to Primary and acts as a dependency for other services.
  • For every listed service the plug-in creates an override (for example, /var/run/systemd/system/a.service.d/reactor.conf) that specifies a dependency on the [email protected] as well as a dependency on the service preceding it. So b.service will depend on a.service, and c.service will depend on b.service.
  • All services are grouped into a [email protected] that acts as a handle for the Promoter. After these service overrides are generated, the plug-in starts (or stops) the [email protected] unit, and systemd performs its function: starting services.

This makes use of an existing, widely used, and well-tested component for service management, namely systemd. By using it, we get all of the power that systemd provides for free, like reliable OnFailure actions.

The last widely-used component for clustering to mention is OCF resource agents. The Promoter plug-in uses these resource agents through a little shim we include in drbd-utils, namely [email protected]. Find a more detailed overview here.

For more information, like the covered failure scenarios, see the documentation in the DRBD User’s Guide and the DRBD Reactor GitHub project page.

Creating a Highly Available File System Mount

After all the theory, let’s examine a straightforward example: A highly-available file system mount. Of course, we use LINSTOR® to create the DRBD resources to keep things simple, but that is not a strict requirement.

The first step is to create a DRBD resource named promoter, which, in this case, is three times redundant.

# linstor resource-group create --place-count 3 promoter 
# linstor resource-group drbd-options promoter --auto-promote no 
# linstor resource-group drbd-options promoter --quorum majority 
# linstor resource-group drbd-options promoter --on-no-quorum io-error
# linstor volume-group create promoter
# linstor resource-group spawn promoter test 20M 

Then we want to create a file system that we can mount:

# drbdadm primary test
# mkfs.ext4 /dev/drbd1000
# drbdadm secondary test

On all nodes that should be able to mount the file system, we create a mount unit:

# cat << EOF > /etc/systemd/system/mnt-test.mount
[Unit]
Description=Mount /dev/drbd1000 to /mnt/test

[Mount]
What=/dev/drbd1000
Where=/mnt/test
Type=ext4
EOF

Then on all nodes, we also need to create a configuration for the Promoter plug-in:

# cat << EOF > /etc/drbd-reactor.d/mnt-test.toml
[[promoter]]
id = "mnt-test"
[promoter.resources.test]
start = ["mnt-test.mount"]
on-drbd-demote-failure = "reboot"
EOF

Let’s do a quick recap of what we get out of that configuration snippet:

Last but not least, we need to start (or restart/reload) DRBD Reactor on all nodes by entering the command systemctl start drbd-reactor.service.

Then we can check which node is Primary and has the device mounted:

# drbd-reactorctl status mnt-test

On the Primary node, we can do a failover just for testing. A later version of drbd-reactorctl might have a dedicated command for that:

# drbd-reactorctl disable --now mnt-test
# # another node should be primary now and have the FS mounted
# drbd-reactorctl enable mnt-test # to re-enable the config again

You can also test a failure scenario by keeping a file open on the mounted device if you want to. Connect to the Primary node and execute the following commands. This action should trigger a reboot, and another node should take over the mount.

# touch /mnt/test/lock
# sleep 3600 < /mnt/test/lock &
# # ^^ this creates an opener and the mount unit will be unable to stop
# # and the DRBD device will be unable to demote
# systemctl restart [email protected] # trigger a stop/restart of the target

Conclusion

Look back at the goals stated at the beginning of this blog post. Compared to a typical Pacemaker stack, we reduced complexity (at the price of flexibility). Besides a small amount of code, we delegate functionality to well-tested and widely used software components (for example, DRBD for quorum, systemd for service management). DRBD Reactor is a simple, single binary daemon, so we also keep the number of components low. As the code is implemented in Rust, it is trivial to build for multiple architectures.

We use DRBD Reactor and the Promoter plug-in for our in-house infrastructure (LINSTOR + OpenNebula) to provide a highly-available LINSTOR controller, the configuration we suggest to our customers. Finally, LINSTOR Gateway and LINBIT VSAN SDS use it.

Roland Kammerer

Roland Kammerer

Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.