DRBD Reactor – Promoter

DRBD Reactor Promoter
In this blog post, we discuss the Promoter plugin. For a more general introduction to drbdreactor,please read the overview blog post first.

Motivation

With many years of experience in high-availability (HA), we noticed that the typical Pacemaker stack used in such HA scenarios introduces great complexity. We have to deal with misconfigured Pacemaker setups in many of our customer tickets. This occurrence is neither the fault of our customers nor Pacemaker’s fault. HA is a complex topic, and Pacemaker is a very flexible tool. With this flexibility and the separation of the whole Pacemaker stack in a sheer endless amount of components (corosync/knet, lrmd, pengine, sonithd, crmd, cib, crm_shell, pcs…) the overall complexity can be overwhelming.

The typical HA stack is also hard to build, and it takes days or even weeks until the complete stack builds and is tested on all the distributions and architectures we support.

So our goals are as follow:

– reduce complexity, keep things simple.

– shift responsibility to well-known and tested components, don’t reinvent the wheel.

– keep configuration simple.

– keep the number of components and, therefore, the interaction between them low.

– finally, make it easy to build.

I want to stress that Pacemaker has been an excellent component of the Linux HA stack for years and is more flexible than DRBD Reactor and its promoter plugin. So I don’t see it as a rival at all. We just think there are scenarios (i.e., the 99%) where the benefits of a simple DRBD Reactor setup predominates.

Implementation

DRBD Reactor is completely tied to HA storage that uses DRBD®. So it is not as generic as Pacemaker in that sense. However, the advantage is that DRBD Reactor can use DRBD states (quorum and “may promote”) as its source for decisions. And actually, these two pieces of information are the only sources needed by DRBD Reactor:

– if the DRBD resource can be promoted (i.e., not active anywhere else and has quorum), then promote the DRBD resource and start the user-defined services.

– if quorum is lost, stop the services, demote the DRBD resource, and give another node the chance to promote the resource and start services. If configured and demotion fails, execute some action (poweroff node, reboot node…).

The first consequence of this design is that DRBD Reactor does not need any further cluster communication besides listening to the before-mentioned events. There is no shared ‘cib’ for cluster configuration that needs synchronization as we know it from the Pacemaker stack. The disadvantage is that configuration files need to be copied to all nodes. Overall the benefits of keeping the daemon simple are more important. Usually, higher-level tools are used to distribute the configuration in the cluster, e.g., linstor-gateway.

If one or multiple nodes detect that they can promote the DRBD resource, they all race to promote the resource to Primary. Only one will be able to do that; the other nodes back off. As an implementation detail: Before a node actually tries to become Primary it might sleep a bit depending on its local resource state. With that, we allow a node that has a good state to win the race for DRBD promotion.

Using DRBD quorum helps us keep the promoter plugin simple by using a component that already implements quorum (i.e., DRBD). So we know when we should promote a DRBD resource and start services that depend on it, but how do we start services? Again, we want to keep things as simple as possible. There is already a standard system component that allows us to start and stop services and even group services into “targets”: `systemd.` I would even go as far as to think of the promoter plugin as a very elaborate `systemd` template generator. We will not go into all details of `systemd` template generation. The technically interested user is referred to this in-depth documentation, but a high-level overview will help understand how the promoter plugin works. Let’s assume we want to start the services `a.service`, `b.service,` and `c.service.`

– the plugin generates an override for the `drbd-promote@.service` that is shipped by `drbd-utils.` This makes sure that the resource gets promoted to `Primary` and acts as a dependency for all the other services.

– for every service in the list the plugin creates an override (e.g., `/var/run/systemd/system/a.service.d/reactor.conf`) that specifies a dependency on the `drbd-promote@.service` as well as a dependency on the service preceding it. So `b.service` will depend on `a.service`, and `c.service` will depend on `b.service`.

– all the services are grouped into a `drbd-resources@.target` that acts as a handle for the promoter. After all these service overrides are generated, the plugin starts (or stops) the `drbd-resources@.target` unit, and `systemd` does what it is good at: starting services.

This once more makes use of an existing, widely used, and well-tested component for service management, namely `systemd.` And by using it, we also get all of the power `systemd` provides for free, like reliable `OnFailure` actions.

The last widely-used component for clustering I want to mention is OCF resource agents. The promoter plugin can use them via a little shim that we ship in `drbd-utils,` namely `ocf.ra@.service`. You can find a more detailed overview here.

There is much more to know about the promoter plugin, like which failure scenarios are actually covered, but what we covered here is enough for a motivational blog post. The interested reader is referred to the documentation.

Example

After all the theory, let’s look at a straightforward example: A highly-available file system mount. Of course, we use LINSTOR to create the DRBD resources to keep things simple, but that is not a strict requirement.

The first step is to create a DRBD resource, which, in this case, is three times redundant.

$ linstor resource-group create --place-count 3 promoter 
$ linstor resource-group drbd-options promoter --auto-promote no 
$ linstor resource-group drbd-options promoter --quorum majority 
$ linstor resource-group drbd-options promoter --on-no-quorum io-error
$ linstor volume-group create promoter
$ linstor resource-group spawn promoter test 20M

Then we want to create a file system we can mount:

$ drbdadm primary test
$ mkfs.ext4 /dev/drbd1000
$ drbdadm secondary test

On all nodes that should be able to mount the file system, we create a mount unit:

$ cat < /etc/systemd/system/mnt-test.mount
[Unit]
Description=Mount /dev/drbd1000 to /mnt/test 

[Mount]
What=/dev/drbd1000
Where=/mnt/test
Type=ext4
EOF

Then on all nodes, we also need to create a configuration for the promoter plugin:

$ cat < /etc/drbd-reactor.d/mnt-test.toml 
[[promoter]]
id = "mnt-test"
[promoter.resources.test]
start = ["mnt-test.mount"]
on-drbd-demote-failure = "reboot"
EOF

Let’s do a quick recap of what we get out of that configuration snippet:

  • A `systemd` template override for `drbd-promote@test.service.`
  • An override for the mount unit with a dependency on the promote service.
  • A `drbd-resources@test.target` containing the dependencies.

Last but not least, we need to start (or restart/reload) DRBD Reactor on all nodes via `systemctl start drbd-reactor.service`.

Then we can check which node is Primary and has the device mounted:

$ drbd-reactorctl status mnt-test

On the node that is Primary, we can do a switch-over, just for testing. A later version of `drbd-reactorctl` might have a dedicated command for that:

$ drbd-reactorctl disable --now mnt-test
$ # another node should be primary now and have the FS mounted
$ drbd-reactorctl enable mnt-test # to re-enable the config again

You can also test a failure scenario by keeping a file open on the mounted device if you want to. Connect to the node that is Primary and execute the following commands. This action should trigger a reboot, and another node should take over the mount.

$ touch /mnt/test/lock
$ sleep 3600 < /mnt/test/lock &
$ # ^^ this creates an opener and the mount unit will be unable to stop
$ # and the DRBD device will be unable to demote
$ systemctl restart drbd-services@test.target # trigger a stop/restart of the target

Conclusion

Let’s look back at the goals stated at the beginning of this blog post. Compared to a typical Pacemaker stack, we certainly reduced complexity (at the price of flexibility). Besides a small amount of code, we delegate functionality to well-tested and widely used software components (e.g., DRBD for quorum, `systemd` for service management). DRBD Reactor is a simple, single binary daemon, so we also keep the number of components low, and being implemented in Rust, it is trivial to build for multiple architectures.

We use DRBD Reactor and the promoter plugin for our in-house infrastructure (LINSTOR + OpenNebula) to provide a highly-available LINSTOR controller, which is the configuration we also suggest to our customers. Further, it is used in linstor-gateway and LINBIT VSAN SDS.

Share on facebook
Share on twitter
Share on linkedin
Share on reddit
Share on whatsapp
Share on vk
Share on email

Share this post

Roland

Roland

Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.