In this blog post, we discuss the DRBD Reactor® and Promoter plug-in. Please read the overview blog post first if you’re looking for a general introduction to DRBD Reactor.
This DRBD Reactor & Promoter post first appeared in February 2022. We have updated it for technical accuracy and clarity, referencing a new chapter in the DRBD User’s Guide.
The Motivation to Develop Another Cluster Manager
With many years of experience in high-availability (HA), we at LINBIT® noticed that the typical Pacemaker stack in such HA scenarios introduces great complexity. We must deal with misconfigured Pacemaker setups in many customer support tickets. This occurrence is neither the fault of our customers nor Pacemaker’s fault. HA is a complex topic, and Pacemaker is a very flexible tool. With this flexibility and the separation of the whole Pacemaker stack in a seemingly endless amount of components (Corosync/knet, LRMd, PENGINE, STONITH, CRMd, CIB,
pcs, and others) the overall complexity can be overwhelming.
The typical HA stack is also hard to build. It takes days or even weeks until the complete stack builds and is tested on all the distributions and architectures we support.
So our goals are as follows:
- Reduce complexity and keep things simple.
- Shift responsibility to well-known and tested components. Don’t reinvent the wheel.
- Keep configuration simple.
- Keep the number of components and the interaction between them low.
- Finally, make it easy to build.
I want to stress that Pacemaker has been an excellent component of the Linux HA stack for years and is more flexible than DRBD Reactor and its Promoter plug-in. So I don’t see it as a rival at all. We think there are scenarios (99% of the time) where the benefits of a simple DRBD Reactor setup could be a better tool for HA use cases.
Implementing High-Availability by Using DRBD Reactor
DRBD Reactor is wholly tied to HA storage that uses DRBD®. So it is not as generic as Pacemaker in that sense. However, the advantage is that DRBD Reactor can use DRBD states (quorum and “may promote”) as its decision source. And actually, DRBD Reactor only needs these two pieces of information:
- If the DRBD resource can be promoted (that is, the resource is not active anywhere else and has quorum), then promote the DRBD resource and start the user-defined services.
- If quorum is lost, stop the services, demote the DRBD resource, and give another node a chance to promote the resource and start services. Finally, if configured and demotion fails, execute some action, for example, poweroff the node, reboot the node, or do other activities.
The first consequence of this design is that DRBD Reactor does not need any further cluster communication besides listening to the events above. There is no shared Cluster Information Base (CIB) for cluster configuration that needs synchronization as we know it from the Pacemaker stack. The disadvantage is that configuration files need to be copied to all nodes. Overall the benefits of keeping the daemon simple are more critical. Usually, higher-level tools distribute the configuration in the cluster, for example, LINSTOR Gateway.
If one or multiple nodes detect that they can promote the DRBD resource, they all race to promote it to Primary. Only one can do that; when that happens, the other nodes back off. As an implementation detail: Before a node tries to become Primary, it might sleep depending on its local resource state. With that, we allow a node with a good state to win the race for DRBD promotion.
DRBD Reactor & Promoter Keeps Things Simple
Using DRBD quorum keeps the Promoter plug-in simple by using a component that already implements quorum (DRBD). So we know when to promote a DRBD resource and start services that depend on it, but how do we start services? Again, simplicity is key. There is already a standard system component that allows us to start and stop services and even group services into “targets”:
systemd. One could even think of the Promoter plug-in as a very elaborate
systemd template generator. For more technical information and details of
systemd template generation, see this in-depth documentation. A high-level overview will help you understand how the Promoter plug-in works. Let’s assume we want to start the services
- The plug-in generates an override for the
[email protected]included with the
drbd-utilspackage. This ensures that the resource gets promoted to
Primaryand acts as a dependency for other services.
- For every listed service the plug-in creates an override (for example,
/var/run/systemd/system/a.service.d/reactor.conf) that specifies a dependency on the
[email protected]as well as a dependency on the service preceding it. So
b.servicewill depend on
c.servicewill depend on
- All services are grouped into a
[email protected]that acts as a handle for the Promoter. After these service overrides are generated, the plug-in starts (or stops) the
[email protected]unit, and
systemdperforms its function: starting services.
This makes use of an existing, widely used, and well-tested component for service management, namely
systemd. By using it, we get all of the power that
systemd provides for free, like reliable
The last widely-used component for clustering to mention is OCF resource agents. The Promoter plug-in uses these resource agents through a little shim we include in
[email protected]. Find a more detailed overview here.
Creating a Highly Available File System Mount
After all the theory, let’s examine a straightforward example: A highly-available file system mount. Of course, we use LINSTOR® to create the DRBD resources to keep things simple, but that is not a strict requirement.
The first step is to create a DRBD resource named
promoter, which, in this case, is three times redundant.
# linstor resource-group create --place-count 3 promoter # linstor resource-group drbd-options promoter --auto-promote no # linstor resource-group drbd-options promoter --quorum majority # linstor resource-group drbd-options promoter --on-no-quorum io-error # linstor volume-group create promoter # linstor resource-group spawn promoter test 20M
Then we want to create a file system that we can mount:
# drbdadm primary test # mkfs.ext4 /dev/drbd1000 # drbdadm secondary test
On all nodes that should be able to mount the file system, we create a mount unit:
# cat << EOF > /etc/systemd/system/mnt-test.mount [Unit] Description=Mount /dev/drbd1000 to /mnt/test [Mount] What=/dev/drbd1000 Where=/mnt/test Type=ext4 EOF
Then on all nodes, we also need to create a configuration for the Promoter plug-in:
# cat << EOF > /etc/drbd-reactor.d/mnt-test.toml [[promoter]] id = "mnt-test" [promoter.resources.test] start = ["mnt-test.mount"] on-drbd-demote-failure = "reboot" EOF
Let’s do a quick recap of what we get out of that configuration snippet:
systemdtemplate override for
- An override for the mount unit with a dependency on the promote service
[email protected]containing the dependencies
Last but not least, we need to start (or restart/reload) DRBD Reactor on all nodes by entering the command
systemctl start drbd-reactor.service.
Then we can check which node is Primary and has the device mounted:
# drbd-reactorctl status mnt-test
On the Primary node, we can do a failover just for testing. A later version of
drbd-reactorctl might have a dedicated command for that:
# drbd-reactorctl disable --now mnt-test # # another node should be primary now and have the FS mounted # drbd-reactorctl enable mnt-test # to re-enable the config again
You can also test a failure scenario by keeping a file open on the mounted device if you want to. Connect to the Primary node and execute the following commands. This action should trigger a reboot, and another node should take over the mount.
# touch /mnt/test/lock # sleep 3600 < /mnt/test/lock & # # ^^ this creates an opener and the mount unit will be unable to stop # # and the DRBD device will be unable to demote # systemctl restart d[email protected] # trigger a stop/restart of the target
Look back at the goals stated at the beginning of this blog post. Compared to a typical Pacemaker stack, we reduced complexity (at the price of flexibility). Besides a small amount of code, we delegate functionality to well-tested and widely used software components (for example, DRBD for quorum,
systemd for service management). DRBD Reactor is a simple, single binary daemon, so we also keep the number of components low. As the code is implemented in Rust, it is trivial to build for multiple architectures.
We use DRBD Reactor and the Promoter plug-in for our in-house infrastructure (LINSTOR + OpenNebula) to provide a highly-available LINSTOR controller, the configuration we suggest to our customers. Finally, LINSTOR Gateway and LINBIT VSAN SDS use it.