DRBD Reactor – Recovering an Isolated DRBD Primary

Recovering an Isolated DRBD Primary

This blog post will show how DRBD® has recently added support for secondary --force can be of great use when combined with DRBD Reactor. In short, secondary-force reconfigures the DRBD device so that the suspended I/O requests and newly submitted I/O requests will terminate with I/O errors.

Motivation 

Imagine a three-node setup where your current active node loses the connection to the other nodes. This is detected by DRBD’s quorum feature, and drbd-reactor::promoter can act, but the choices have been limited, and results might be undesirable. Let’s assume the realized service was a simple HA file system mount. And now also imagine that the file system mount was actively used when quorum was lost. Then the promoter plugin for that resource would detect that quorum was lost, try to stop the services, and then demote the DRBD device. The service would have been a systemd mount unit in our simple case. As there are still users of the file system, the umount command would have failed, and later in the chain, demoting the DRBD device would have also failed. So how do you recover a cluster from such a situation? Usually, that is done via some failure action. Indeed, for every resource managed by the promoter plugin, one can set a systemd failure action like “reboot” via the on-drbd-demote-failure configuration option. Rebooting the node can be a good recovery strategy, but what if you have hundreds of DRBD resources active on that node? Is then killing n-1 important resources worth the one unimportant one that was blocked? 

Implementation 

The implementation is relatively simple: Whenever we usually call drbdsetup secondary, we now call first drbdsetup secondary and then drbdsetup secondary --force. Keeping the first call has the advantage that logs are nicer to read. As an admin, one sees that the usual drbdsetup secondary failed before secondary --force was called. Using secondary --force is the new default, and one must set the secondary-force = false in the promoter plugin’s configuration to disable it. 

Example 

We start with the usual example of a systemd mount unit as a service. Please follow the rest of the example and the discussion of why using a mount unit in this scenario isn’t a good choice, but let’s build up our knowledge step by step. For example, a configuration might look like this: 

[[promoter]]
id = "mnt-test"
[promoter.resources.test]
start = ["mnt-test.mount"]
on-drbd-demote-failure = "reboot"
secondary-force = false

Note that we intentionally set secondary-force to false to simulate the behavior as it was. If we then open the device, in our case, a read-only opener via sleep 3600 </dev/drbd1000 on the active node and then isolate the node via for i in INPUT OUTPUT; do iptables -A $i -p tcp --dport 7000 -j REJECT; done we will see that the node reboots. Expected, but our imaginary hundred other resources got affected as well. 

So, what if we now delete the secondary-force config, reboot, and try the same again? We see that secondary --force was executed and that the device got reconfigured to return I/O errors, that it is secondary, and did not reboot. Great, we even see another node of those still having quorum started the service. Unfortunately, our original DRBD Primary node will never reintegrate (well, maybe after the 3600 seconds elapsed). Why is that? Usually, it is fine to use systemd mount units if there is some service on top of that which uses the mount unit and systemd is aware. A typical case would be a highly-available LINSTOR® controller, that would have a start list of start = ["var-lib-linstor.mount", "linstor-controller.service"]. Then on quorum loss systemd would have made sure that the controller service gets stopped (or killed), and the device would have been unmounted as all its users have been stopped.  

But what can we do if the mount point is the final service? Then we have all kinds of users about which systemd does not know and can’t do anything. It’s like our read-only sleeper that idles around and blocks the device from being unmounted. Note that if the process did any I/O (we would have any I/O pending), it would receive I/O errors and hopefully terminate. So, in our edge case where the device’s opener idles around, the answer is not using a systemd mount unit if would be the top service, but an OCF file system resource agent instead. This has all kinds of tricks built in to ensure all users of a file system mount get found and terminated. So, in our scenario, we would use a start list like this: 

start = ["ocf:heartbeat:Filesystem fs_test device=/dev/drbd1000 directory=/mnt/test fstype=ext4 run_fsck=no"]

Using this, and again blocking all traffic via iptables, we see that the node that lost quorum cleanly demotes does not reboot and that another node takes over the service. After an iptables -F the node is ready to be a target for the next fail/switch-over. 

Conclusion 

In this blog post, we saw two things: DRBD’s secondary-force feature is a significant improvement to handle isolated DRBD-Primaries more benign. This can make the difference between rebooting a node with hundreds of essential resources or just demoting a single DRBD device. The second observation was that there still can be edge cases where secondary-force itself does not solve all issues. For example, secondary-force configures the device to return I/O errors in the hope that users of the device terminate when they receive such I/O errors, but that does not help if, as in our example, a read-only opener idles around forever. We combined such an idle opener with a file system mount and saw that we need some components to ensure these openers vanish. In our example, we replaced the systemd mount unit with an OCF file system resource agent. 

Roland Kammerer

Roland Kammerer

Roland Kammerer studied technical computer science at the Vienna University of Technology and graduated with distinction. Currently, he is a PhD candidate with a research focus on time-triggered realtime-systems and works for LINBIT in the DRBD development team.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.