Highly Available NFS Exports with DRBD & Pacemaker

Highly available NFS

Creating Highly Available NFS Targets with DRBD and Pacemaker

This blog post explains how to configure a highly available (HA) active/passive NFS server on a three-node Linux cluster by using DRBD® and Pacemaker. You can implement this solution on DEB or RPM-based Linux distributions, for example, Ubuntu, or Red Hat Enterprise Linux (RHEL). The instructions in this article were tested on RHEL 9 and Ubuntu 24.04 clusters.

NFS is well-suited for many use cases because it:

  • Enables many computers to access the same files, so everyone on the network can use the same data.
  • Reduces storage costs by having computers share applications rather than needing local disk space for each user application.

The system preparation requirements for this use case are:

  • Two diskful nodes for data replication, one diskless node for quorum purposes.
  • A separate network link for DRBD replication. (This is best practice, not mandatory.)
  • Pacemaker and Corosync are installed on all nodes and their services are enabled to start at system boot time.
  • Open Cluster Framework (OCF) resource agents are installed on all nodes. (If you are a LINBIT® customer, install the resource-agents and drbd-pacemaker packages to install the resource agents used in this article.)
  • A virtual IP address, required for the NFS server. (In this article, the OCF IPaddr2 resource agent is used to automatically set this to 172.16.16.102 in the Pacemaker configuration.)
  • The latest version of DRBD is installed on all nodes and loaded into kernel. (Available from Github, or through LINBIT customer repositories. See the DRBD 9.0 User’s Guide for more details.)
  • An NFS server is installed on all nodes for redundancy. (The NFS server service should not be enabled to start as Pacemaker will start the server when necessary.)
  • All cluster nodes can resolve each other’s hostnames. (Check /etc/hosts or your local DNS server.)
  • SELinux and any firewalls in use are configured to allow appropriate traffic on all nodes. (Refer to the DRBD 9.0 User’s Guide for more information.)
  • The crmsh and pcs CLI utilities are installed on all nodes, for editing and managing the Pacemaker configuration.

After completing these initial preparation steps, you can create your high availability NFS cluster.

Creating a Logical Volume and Directory for the NFS Share

Before creating DRBD resources in the cluster, you need to create a physical volume on top of the physical device (drive). These commands should be entered as root user or else prefaced with sudo.

To do that, enter:

pvcreate /dev/sdx

Here, “x” in sdx corresponds to the letter identifying your physical device.

Then create the volume group, named nfs_vg, by entering:

vgcreate nfs_vg /dev/sdx

Next, create the logical volumes that DRBD will consume.

The first logical volume will be for storing NFS stateful connection information. If NFS stateful connection is not highly available or otherwise synchronized between cluster nodes, then in some failover cases, it might take a long time for NFS exports to become available. This volume will not hold much data and 20M can be a sufficient size.

The second volume will store data that you will share by using NFS. You can replace the 300G with a size appropriate for your use, or else use the -l 100%FREE option rather than -L 300G in the command if you want the logical volume to use 100% of your physical volume.

lvcreate -L 20M -n ha_nfs_internal_lv nfs_vg
lvcreate -L 300G -n ha_nfs_exports_lv nfs_vg

After creating the logical volumes, create the directories that will serve as the mount point for your share, and the mount point for the cluster internal state.

mkdir -p /srv/drbd-nfs/exports/HA
mkdir -p /srv/drbd-nfs/internal

Configuring DRBD

After preparing your backing storage device and a file system mount point on your nodes, you can next configure DRBD to replicate the storage device across the nodes.

Creating a DRBD Resource File

DRBD resource configuration files are located in the /etc/drbd.d/ directory. Resource files need to be created on all cluster nodes. You can create a resource file on one node and then use the rsync command to distribute the file to other nodes. Each DRBD resource defined in the resource configuration file needs a different TCP port. There is only one defined resource in this configuration, so the configuration just uses one TCP port (7003) here. Use the text editor of your choice to create the DRBD resource file as shown below. Change the host names and IP addresses to reflect your network configuration.

📝 NOTE: The third cluster node only serves a quorum function in the cluster. It is not involved in DRBD replication. In the configuration, it is identified as “diskless”.

vi /etc/drbd.d/ha_nfs.res

resource ha_nfs {
  volume 0 {
    device "/dev/drbd1002";
    disk "/dev/nfs_vg/ha_nfs_internal_lv";
    meta-disk internal;
  }
  volume 1 {
    device "/dev/drbd1003";
    disk "/dev/nfs_vg/ha_nfs_exports_lv";
    meta-disk internal;
   }
  options {
    on-no-quorum suspend-io;
    quorum majority;
  }
  connection-mesh {
    hosts "drbd1" "drbd2" "drbd3";
  }
  on "drbd1" {
    address 172.16.16.111:7003;
    node-id 0;
  }
  on "drbd2" {
    address 172.16.16.112:7003;
    node-id 1;
  }
  on "drbd3" {
    disk none;
    address 172.16.16.113:7003;
    node-id 2;
  }
}

Initializing DRBD Resources

After creating the DRBD resource configuration file, you need to initialize DRBD resources. To do that, enter the following commands as ‘root’ user or use sudo. The first two commands must be entered and run on both diskful cluster nodes.

drbdadm create-md ha_nfs
drbdadm up ha_nfs

As this is a new file system with no data content, you can skip the initial synchronization. However, use caution with this command because it can delete any existing data on the logical volume. Enter the following command on a diskful node:

drbdadm new-current-uuid --clear-bitmap ha_nfs/0

Next, enter and run the following commands on only one of the two cluster nodes. DRBD will replicate the file systems and the directories that the commands create to the other “diskful” cluster node.

drbdadm primary --force ha_nfs
mkfs.ext4 /dev/drbd1002
mkfs.ext4 /dev/drbd1003
mount /dev/drbd1002 /srv/drbd-nfs/internal
mkdir /srv/drbd-nfs/internal/portblock_tickle_dir
mkdir /srv/drbd-nfs/internal/nfs_info_dir
umount /dev/drbd1002

Entering these commands will do a few things:

  • Force the node to become primary.
  • Create the needed file systems on the DRBD devices.
  • Mount the “internal” DRBD device to a mount point.
  • Create two “internal” informational directories. The first directory will store stateful information related to NFS connections. The second directory, also known as the “tickle” directory, will be used by a portblock OCF resource agent to store established TCP connections. Using the portblock resource agent with a “tickle” directory might allow for clients to reconnect faster after failover events. Refer to man ocf_heartbeat_portblock for more information.
  • Unmount the “internal” DRBD device.

💡 TIP: When making a new file system on a large volume, you might consider using the -E nodiscard option with the mkfs.ext4 command. Using this option might speed up the command operation.

Next, check the drbdadm status and lsblk commands.

The drbdadm status command should show that DRBD is in sync and UpToDate. If everything looks fine, use the following command to change primary DRBD to secondary.

drbdadm secondary ha_nfs

Creating NFS Exports and Pacemaker Resources

There are two ways to create Pacemaker resources. The first way is by directly editing the Pacemaker configuration file by using the interactive crm shell. The second way is by using the pcs command-line tool. This example uses the crm shell to edit the Pacemaker configuration.

To enter the crm shell, enter crm. Next, edit the Pacemaker configuration by entering configure edit. Press the “i” key (if your default editor is Vi or Vim) to enter the insert mode to edit the file.

In this mode, delete everything in the configuration file and paste in the following configuration. Remember to change hostnames, subnets, and IP addresses to match your network configuration.

node 1: drbd1
node 2: drbd2
node 3: drbd3
primitive p_virtip IPaddr2 \
    params \
        ip=172.16.16.102 \
        cidr_netmask=24 \
    op monitor interval=0s timeout=40s \
    op start interval=0s timeout=20s \
    op stop interval=0s timeout=20s
primitive p_drbd_ha_nfs ocf:linbit:drbd \
    params \
        drbd_resource=ha_nfs \
    op monitor timeout=20 interval=21 role=Slave \
    op monitor timeout=20 interval=20 role=Master
primitive p_expfs_nfsshare_exports_HA exportfs \
    params \
        clientspec="172.16.16.0/24" \
        directory="/srv/drbd-nfs/exports/HA" \
        fsid=1003 unlock_on_stop=1 options=rw \
    op monitor interval=15s timeout=40s \
    op_params OCF_CHECK_LEVEL=0 \
    op start interval=0s timeout=40s \
    op stop interval=0s timeout=120s
primitive p_fs_nfs_internal_info_HA Filesystem \
    params \
        device="/dev/drbd1002" \
        directory="/srv/drbd-nfs/internal" \
        fstype=ext4 \
        run_fsck=no \
    op monitor interval=15s timeout=40s \
    op_params OCF_CHECK_LEVEL=0 \
    op start interval=0s timeout=60s \
    op stop interval=0s timeout=60s
primitive p_fs_nfsshare_exports_HA Filesystem \
    params \
        device="/dev/drbd1003" \
        directory="/srv/drbd-nfs/exports/HA" \
        fstype=ext4 \
        run_fsck=no \
    op monitor interval=15s timeout=40s \
    op_params OCF_CHECK_LEVEL=0 \
    op start interval=0s timeout=60s \
    op stop interval=0s timeout=60s
primitive p_nfsserver nfsserver \
    params \
        nfs_shared_infodir="/srv/drbd-nfs/internal/nfs_info_dir" \
        nfs_server_scope=172.16.16.102 \
        nfs_ip=172.16.16.102 \
    op monitor interval=10s timeout=20s \
    op start interval=0s timeout=40s \
    op stop interval=0s timeout=20s
primitive p_pb_block portblock \
    params \
        action=block \
        ip=172.16.16.102 \
        portno=2049 \
        protocol=tcp
primitive p_pb_unblock portblock \
    params \
        action=unblock \
        ip=172.16.16.102 \
        portno=2049 \
        tickle_dir="/srv/drbd-nfs/internal/portblock_tickle_dir" \
        reset_local_on_unblock_stop=1 protocol=tcp \
    op monitor interval=10s timeout=20s
ms ms_drbd_ha_nfs p_drbd_ha_nfs \
    meta master-max=1 master-node-max=1 \
    clone-node-max=1 clone-max=3 notify=true
group g_nfs p_pb_block p_virtip p_fs_nfsshare_exports_HA \
    p_nfsserver p_expfs_nfsshare_exports_HA p_pb_unblock
colocation co_ha_nfs inf: \
    g_nfs:Started ms_drbd_ha_nfs:Master
order o_ms_drbd_ha_nfs-before-g_nfs ms_drbd_ha_nfs:promote g_nfs:start
property cib-bootstrap-options: \
    have-watchdog=false \
    cluster-infrastructure=corosync \
    cluster-name=nfscluster \
    stonith-enabled=false

After you have finished editing the configuration, save the file and exit the editor by entering :x (if your default editor is Vi or Vim). Next, commit the changes by entering the configure commit command in the crm shell. Pacemaker will then try to start the NFS server service on one of the nodes. Enter quit to leave the crm shell. Then enter and run the following command to clean up (restart in a way) cluster resources:

crm resource cleanup

Next, enter crm_mon to see if everything is working fine.

With that, you have configured an NFS high availability cluster, and the NFS share is ready to be used by clients on your network. You can verify the availability of your NFS share by entering the command showmount -e 172.16.16.102 from any host that is on the 172.16.16.0/24 network.

Having DRBD and Pacemaker running on your cluster stack makes it so that if one cluster node fails, the other cluster node will take over seamlessly. This is because you have prepared redundant services and because DRBD ensures real-time, up-to-date data replication.

If you need more details or help from our experienced team, you can contact the experts at LINBIT.


Changelog

2022-03-09:

  • Originally published.

2023-12-04:

  • Added user submitted suggestions and LINBIT technical review.

2024-07-15:

  • LINBIT technical review.

2024-09-23:

  • Technical improvements made to HA architecture.
  • Other improvements to technical details of instructions.
Yusuf Yıldız

Yusuf Yıldız

After nearly 15 years of system and storage management, Yusuf started to work as a solution architect at LINBIT. Yusuf's main focus is on customer success and contributing to product development and testing. As part of the solution architects team, he is one of the backbone and supporter of the sales team.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.