Implementing fencing is a way to ensure the consistency of your replicated data by avoiding “split-brain” scenarios. When communication between cluster nodes breaks, fencing prevents the data from diverging among your data replicas.
DRBD® and Pacemaker each have their own implementations of fencing. A Pacemaker fencing implementation usually involves a hardware outlay. When using software-based fencing, such as hypervisor or API driven STONITH agents, the configurations can quickly become complex. Then you need to configure fencing agent scripts particular to the hardware to perform STONITH actions when necessary. When using DRBD in a Pacemaker-managed cluster, you can achieve fencing without the hardware expenditure and configuration time, by using the DRBD built-in quorum feature. To be clear, this article only talks about how to achieve fencing of DRBD resources in a Pacemaker-managed cluster. Other resources in a Pacemaker cluster that would be independent of DRBD would require their own fencing implementation, or else you would need to use Pacemaker’s fencing implementation if they did not have their own.
Although DRBD does have fencing (and STONITH) capabilities, you can instead use the DRBD quorum feature to achieve fencing in your cluster. Using DRBD’s quorum feature allows you to achieve what fencing and STONITH configurations do, but in an easier, more understandable way. Again, for other independent, non-DRBD resources you can use STONITH hardware and configure node-level fencing within Pacemaker to complete a fencing setup for your high-availability applications and resources.
For an overview of how you can configure Pacemaker node-level fencing, LINBIT Solutions Architect Matt Kereczman wrote an article on the topic. In the article, Kereczman uses a highly available VirtualBox hypervisor deployment as a case study.
Defining DRBD Quorum As a Majority of Nodes
From the DRBD 9 User’s Guide:
The basic idea [of DRBD’s quorum mechanism] is that a cluster partition may only modify the replicated data set if the number of nodes that can communicate is greater than half of the overall number of nodes. A node of such a partition has quorum. However, a node that does not have quorum needs to guarantee that the replicated data set is not touched, so that the node does not create a diverging data set.
Configuring DRBD’s Quorum Settings
You can configure quorum in DRBD by adding settings to an options
section of a DRBD configuration file. You can configure these settings at a node level, by adding settings to a DRBD global configuration file, or at a resource level, by adding settings to a DRBD resource configuration file.
Your DRBD quorum configuration will consist of two basic settings. One setting defines what quorum is, and the second setting defines what action DRBD will take on a node that no longer has quorum.
Here is a basic example of DRBD quorum settings:
options {
quorum majority; # majority | all | <numeric_value>
on-no-quorum suspend-io; # suspend-io | io-error
[...]
}
Defining DRBD Quorum
In most cases, you will define DRBD quorum to be majority
. This is the definition described in the previously quoted passage from the DRBD 9 User’s Guide. For a node to have quorum, it must be able to communicate with more than half the number of total nodes in the cluster, that is, the node must be a part of a majority partition of nodes in the cluster. Here, you need an odd number of cluster nodes. Three nodes is sufficient.
If you have a cluster with an even number of nodes, you can add a “diskless” node, to give your cluster an odd number of nodes. Using a diskless node will save you the full expense of adding an additional node with the same storage and hardware requirements of your “diskful” nodes. See the Permanently Diskless Nodes section in the DRBD 9 User’s Guide for more information.
Setting the DRBD On-loss-of-quorum Action
DRBD has two on-loss-of-quorum actions that will protect your data from diverging. The suspend-io
action suspends all queued and future I/O operations to the backing DRBD device. The io-error
action causes I/O operations to the backing DRBD device to result in I/O errors that will be passed up to higher layers such as an application running on top of or using data on the DRBD device.
In most cases, LINBIT® recommends defining DRBD’s on-no-quorum
action to be suspend-io
. This action on loss of quorum protects your data in cases where your user-space application might not exit cleanly upon receiving I/O errors, or when your networking might be less than stable. For example, if your network experiences latency spikes or regular outages, or when spanning tree protocol is used and might result in network convergence delays.
With the suspend-io
action configured for when a node loses quorum, DRBD will suspend I/O operations and you can reboot the node manually. LINBIT also recommends configuring DRBD’s on-suspended-primary-outdated force-secondary
option, to improve node recovery after a failover. This setting makes automatic reintegration possible in the situation where a previous primary node connects to the new primary node after a failover. Upon returning to the cluster, a primary node that lost quorum with suspended I/O operations will be demoted to a secondary role and all suspended and future I/O operations will terminate with I/O errors.
Alternatively, if you have a rock solid network and your application terminates gracefully upon receiving an I/O error from backing storage, then you might prefer to define the DRBD on-no-quorum
action to be io-error
rather than suspend-io
.
⚠️ WARNING: Sending I/O errors to clients accessing the storage shared from the cluster, such as in an iSCSI or NFS cluster, will result in
io-errors
reaching client systems which could require clients to reattach or remount cluster storage.
Configuring Handler Scripts for DRBD Quorum Loss
You can also configure settings within a handlers
section of a DRBD configuration file, so that upon losing quorum, DRBD will trigger an action, for example, a Pacemaker CRM script. The action should be one that will get Pacemaker to react in a way that protects the integrity of your cluster and data.
For example, if the user-space application that uses the DRBD resource exits cleanly upon receiving I/O errors, by configuring cluster resource manager (CRM) scripts, Pacemaker can unmount the file system and demote the DRBD resource to a secondary role on that node, to preserve the integrity of your replicated data.
In a “last resort” case, you can configure a handler to reboot the node when it has lost quorum, by adding the following section to your DRBD configuration file:
handlers {
quorum-lost "echo b > /proc/sysrq-trigger"; # reboot node
}
It is important to configure Pacemaker to handle the node properly when it comes back up, after rebooting. You can do this by disabling DRBD’s “auto-promote” feature, configuring STONITH devices and their corresponding Pacemaker agents, and setting up node-level fencing and monitoring within Pacemaker.
The details of these Pacemaker configurations will vary depending on the STONITH devices you use, your cluster environment, and the resource types that your high-availability applications use.
Before deploying any quorum settings within a production environment, it is important that you test how your application and file system will behave upon receiving I/O errors, so that you can configure DRBD quorum settings appropriately.
If you have questions about configurations particular to your environment and applications, you can contact the experts at LINBIT.