Implement Fencing Within a Pacemaker-Managed Cluster Using DRBD 9’s Quorum Feature

Implementing fencing properly is a way to ensure the consistency of your replicated data by avoiding “split-brain” scenarios. When communication between cluster nodes breaks, fencing prevents the data from diverging among your replicas.

DRBD® and Pacemaker each have their own implementations of fencing. To be clear, this article only talks about how to achieve fencing of DRBD resources in a Pacemaker-managed cluster.

Although DRBD 9 does have fencing and STONITH capabilities, you can instead use DRBD 9’s “quorum” feature to achieve DRBD fencing in your cluster. Using DRBD’s quorum feature allows you to achieve what fencing and STONITH configurations do, but in an easier, more understandable way. You can use STONITH devices and configure node-level fencing within Pacemaker to complete a fencing setup for your high-availability applications and resources.

From the DRBD 9 User’s Guide:

“The basic idea [of DRBD’s quorum mechanism] is that a cluster partition may only modify the replicated data set if the number of nodes that can communicate is greater than half of the overall number of nodes. A node of such a partition has quorum. However, a node that does not have quorum needs to guarantee that the replicated data set is not touched, so that the node does not create a diverging data set”

Configuring DRBD’s Quorum Settings

You configure quorum in DRBD by adding settings to an “options” section of a DRBD configuration file. You can configure these settings at a node level, by adding settings to DRBD’s global configuration file, or at a resource level, by adding settings to a DRBD resource configuration file.

Your DRBD quorum configuration will consist of two basic settings. One setting defines what quorum is and the second setting defines what action DRBD will take on a node that no longer has quorum.

Here is a basic example of DRBD quorum settings:

options {
    quorum majority;          # majority | all | <numeric_value>
    on-no-quorum suspend-io;  # suspend-io | io-error
    [...]
}

Defining DRBD Quorum

In most cases, you will define DRBD quorum to be “majority”. This is the definition described in the quoted passage from the DRBD 9 User’s Guide above. For a node to have quorum, it must be able to communicate with more than half the number of total nodes in the cluster. Here, an odd number of cluster nodes is necessary, with three nodes being sufficient.

If you have a cluster with an even number of nodes, you can add a “diskless” node, to give your cluster an odd number of nodes. Using a diskless node will save you the full expense of adding an additional node with the same storage and hardware requirements of your “diskful” nodes. See the Permanently Diskless Nodes section in the DRBD 9 User’s Guide for more information.

Setting DRBD’s On-loss-of-quorum Action

DRBD has two on-loss-of-quorum actions that will protect your data from diverging. The suspend-io action suspends all queued and future I/O operations to the backing DRBD device. The io-error action causes I/O operations to the backing DRBD device to result in I/O errors.

In most cases, LINBIT® recommends defining DRBD’s on-no-quorum action to be suspend-io. This action on loss of quorum protects your data in cases where your user-space application might not exit cleanly upon receiving I/O errors, or when your networking may be less than stable. For example, if your network experiences latency spikes or regular outages and when spanning tree protocol is used and may result in network convergence delays.

With the suspend-io action configured for when a node loses quorum, DRBD will suspend I/O operations and you can reboot the node manually. LINBIT also recommends configuring DRBD’s on-suspended-primary-outdated force-secondary option, to improve node recovery after a failover. This setting makes automatic reintegration possible in the situation where a previous primary node connects to the new primary node after a failover. Upon returning to the cluster, a primary node that lost quorum with suspended I/O operations will be demoted to a secondary role and all suspended and future I/O operations will terminate with I/O errors.

Alternatively, if your application terminates gracefully upon receiving an I/O error from its storage, defining DRBD’s on-no-quorum action to be io-error might be preferred over suspend-io. Sending I/O errors to clients accessing the storage shared from the cluster, such as in an iSCSI or NFS cluster, will result in io-errors reaching client systems which could require clients to reattach or remount cluster storage.

Configuring Handler Scripts for DRBD Quorum Loss

You can also configure settings within a “handlers” section of a DRBD configuration file, so that upon losing quorum, DRBD will trigger an action, for example, a Pacemaker CRM script. The action should be one that will get Pacemaker to react in a way that protects the integrity of your cluster and data.

For example, if the user-space application that uses the DRBD resource exits cleanly upon receiving I/O errors, by configuring cluster resource manager (CRM) scripts, Pacemaker can unmount the file system and demote the DRBD resource to a Secondary role on that node, to preserve the integrity of your replicated data.

In a “last resort” case, you can configure a handler to reboot the node when it has lost quorum, by adding this section to your DRBD configuration file:

handlers {
    quorum-lost "echo b > /proc/sysrq-trigger"; # reboot node
}

It is important to configure Pacemaker to handle the node properly when it comes back up, after rebooting. You can do this by disabling DRBD’s “auto-promote” feature, configuring STONITH devices and their corresponding Pacemaker agents, and setting up node-level fencing and monitoring within Pacemaker.

The details of these Pacemaker configurations will vary depending on the STONITH devices you use, your cluster environment, and the resource types that your high-availability applications use.

Before deploying any quorum settings within a production environment, it is important that you test how your application and file system will behave upon receiving I/O errors, so that you can configure DRBD quorum settings appropriately.

If you have questions about configurations particular to your environment and applications, you can contact the experts at LINBIT.

Share on facebook
Share on twitter
Share on linkedin
Share on reddit
Share on whatsapp
Share on vk
Share on email

Share this post

Michael Troutman

Michael Troutman

Michael Troutman has an extensive background working in systems administration, networking, and technical support, and has worked with Linux since the early 2000s. Michael's interest in writing goes back to an avid reading filled childhood. Somewhere he still has the rejection letter from a publisher for a choose-your-own-adventure style novella, set in the world of a then popular science fiction role-playing game, cowritten with his grandmother (a romance novelist and travel writer) when at the tender age of 10. In the spirit of the open source community in which LINBIT thrives, Michael works as a Documentation Specialist to help LINBIT document its software and its many uses so that it may be widely understood and used.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.