Creating Highly Available NFS Targets with DRBD and Pacemaker
This blog post explains how to configure a highly available (HA) active/passive NFS server on a three-node Linux cluster by using DRBD® and Pacemaker. You can implement this solution on DEB or RPM-based Linux distributions, for example, Ubuntu, or Red Hat Enterprise Linux (RHEL). The instructions in this article were tested on RHEL 9 and Ubuntu 24.04 clusters.
NFS is well-suited for many use cases because it:
- Enables many computers to access the same files, so everyone on the network can use the same data.
- Reduces storage costs by having computers share applications rather than needing local disk space for each user application.
The system preparation requirements for this use case are:
- Two diskful nodes for data replication, one diskless node for quorum purposes.
- A separate network link for DRBD replication. (This is best practice, not mandatory.)
- Pacemaker and Corosync are installed on all nodes and their services are enabled to start at system boot time.
- Open Cluster Framework (OCF) resource agents are installed on all nodes. (If you are a LINBIT® customer, install the
resource-agents
anddrbd-pacemaker
packages to install the resource agents used in this article.) - A virtual IP address, required for the NFS server. (In this article, the OCF IPaddr2 resource agent is used to automatically set this to 172.16.16.102 in the Pacemaker configuration.)
- The latest version of DRBD is installed on all nodes and loaded into kernel. (Available from Github, or through LINBIT customer repositories. See the DRBD 9.0 User’s Guide for more details.)
- An NFS server is installed on all nodes for redundancy. (The NFS server service should not be enabled to start as Pacemaker will start the server when necessary.)
- All cluster nodes can resolve each other’s hostnames. (Check
/etc/hosts
or your local DNS server.) - SELinux and any firewalls in use are configured to allow appropriate traffic on all nodes. (Refer to the DRBD 9.0 User’s Guide for more information.)
- The
crmsh
andpcs
CLI utilities are installed on all nodes, for editing and managing the Pacemaker configuration.
After completing these initial preparation steps, you can create your high availability NFS cluster.
Creating a Logical Volume and Directory for the NFS Share
Before creating DRBD resources in the cluster, you need to create a physical volume on top of the physical device (drive). These commands should be entered as root
user or else prefaced with sudo
.
To do that, enter:
pvcreate /dev/sdx
Here, “x” in sdx
corresponds to the letter identifying your physical device.
Then create the volume group, named nfs_vg
, by entering:
vgcreate nfs_vg /dev/sdx
Next, create the logical volumes that DRBD will consume.
The first logical volume will be for storing NFS stateful connection information. If NFS stateful connection is not highly available or otherwise synchronized between cluster nodes, then in some failover cases, it might take a long time for NFS exports to become available. This volume will not hold much data and 20M can be a sufficient size.
The second volume will store data that you will share by using NFS. You can replace the 300G
with a size appropriate for your use, or else use the -l 100%FREE
option rather than -L 300G
in the command if you want the logical volume to use 100% of your physical volume.
lvcreate -L 20M -n ha_nfs_internal_lv nfs_vg
lvcreate -L 300G -n ha_nfs_exports_lv nfs_vg
After creating the logical volumes, create the directories that will serve as the mount point for your share, and the mount point for the cluster internal state.
mkdir -p /srv/drbd-nfs/exports/HA
mkdir -p /srv/drbd-nfs/internal
Configuring DRBD
After preparing your backing storage device and a file system mount point on your nodes, you can next configure DRBD to replicate the storage device across the nodes.
Creating a DRBD Resource File
DRBD resource configuration files are located in the /etc/drbd.d/
directory. Resource files need to be created on all cluster nodes. You can create a resource file on one node and then use the rsync
command to distribute the file to other nodes. Each DRBD resource defined in the resource configuration file needs a different TCP port. There is only one defined resource in this configuration, so the configuration just uses one TCP port (7003) here. Use the text editor of your choice to create the DRBD resource file as shown below. Change the host names and IP addresses to reflect your network configuration.
📝 NOTE: The third cluster node only serves a quorum function in the cluster. It is not involved in DRBD replication. In the configuration, it is identified as “diskless”.
vi /etc/drbd.d/ha_nfs.res
resource ha_nfs {
volume 0 {
device "/dev/drbd1002";
disk "/dev/nfs_vg/ha_nfs_internal_lv";
meta-disk internal;
}
volume 1 {
device "/dev/drbd1003";
disk "/dev/nfs_vg/ha_nfs_exports_lv";
meta-disk internal;
}
options {
on-no-quorum suspend-io;
quorum majority;
}
connection-mesh {
hosts "drbd1" "drbd2" "drbd3";
}
on "drbd1" {
address 172.16.16.111:7003;
node-id 0;
}
on "drbd2" {
address 172.16.16.112:7003;
node-id 1;
}
on "drbd3" {
disk none;
address 172.16.16.113:7003;
node-id 2;
}
}
Initializing DRBD Resources
After creating the DRBD resource configuration file, you need to initialize DRBD resources. To do that, enter the following commands as ‘root’ user or use sudo.
The first two commands must be entered and run on both diskful cluster nodes.
drbdadm create-md ha_nfs
drbdadm up ha_nfs
As this is a new file system with no data content, you can skip the initial synchronization. However, use caution with this command because it can delete any existing data on the logical volume. Enter the following command on a diskful node:
drbdadm new-current-uuid --clear-bitmap ha_nfs/0
Next, enter and run the following commands on only one of the two cluster nodes. DRBD will replicate the file systems and the directories that the commands create to the other “diskful” cluster node.
drbdadm primary --force ha_nfs
mkfs.ext4 /dev/drbd1002
mkfs.ext4 /dev/drbd1003
mount /dev/drbd1002 /srv/drbd-nfs/internal
mkdir /srv/drbd-nfs/internal/portblock_tickle_dir
mkdir /srv/drbd-nfs/internal/nfs_info_dir
umount /dev/drbd1002
Entering these commands will do a few things:
- Force the node to become primary.
- Create the needed file systems on the DRBD devices.
- Mount the “internal” DRBD device to a mount point.
- Create two “internal” informational directories. The first directory will store stateful information related to NFS connections. The second directory, also known as the “tickle” directory, will be used by a
portblock
OCF resource agent to store established TCP connections. Using the portblock resource agent with a “tickle” directory might allow for clients to reconnect faster after failover events. Refer toman ocf_heartbeat_portblock
for more information. - Unmount the “internal” DRBD device.
💡 TIP: When making a new file system on a large volume, you might consider using the
-E nodiscard
option with themkfs.ext4
command. Using this option might speed up the command operation.
Next, check the drbdadm status
and lsblk
commands.
The drbdadm status
command should show that DRBD is in sync and UpToDate
. If everything looks fine, use the following command to change primary DRBD to secondary.
drbdadm secondary ha_nfs
Creating NFS Exports and Pacemaker Resources
There are two ways to create Pacemaker resources. The first way is by directly editing the Pacemaker configuration file by using the interactive crm
shell. The second way is by using the pcs
command-line tool. This example uses the crm
shell to edit the Pacemaker configuration.
To enter the crm
shell, enter crm
. Next, edit the Pacemaker configuration by entering configure edit
. Press the “i” key (if your default editor is Vi or Vim) to enter the insert mode to edit the file.
In this mode, delete everything in the configuration file and paste in the following configuration. Remember to change hostnames, subnets, and IP addresses to match your network configuration.
node 1: drbd1
node 2: drbd2
node 3: drbd3
primitive p_virtip IPaddr2 \
params \
ip=172.16.16.102 \
cidr_netmask=24 \
op monitor interval=0s timeout=40s \
op start interval=0s timeout=20s \
op stop interval=0s timeout=20s
primitive p_drbd_ha_nfs ocf:linbit:drbd \
params \
drbd_resource=ha_nfs \
op monitor timeout=20 interval=21 role=Slave \
op monitor timeout=20 interval=20 role=Master
primitive p_expfs_nfsshare_exports_HA exportfs \
params \
clientspec="172.16.16.0/24" \
directory="/srv/drbd-nfs/exports/HA" \
fsid=1003 unlock_on_stop=1 options=rw \
op monitor interval=15s timeout=40s \
op_params OCF_CHECK_LEVEL=0 \
op start interval=0s timeout=40s \
op stop interval=0s timeout=120s
primitive p_fs_nfs_internal_info_HA Filesystem \
params \
device="/dev/drbd1002" \
directory="/srv/drbd-nfs/internal" \
fstype=ext4 \
run_fsck=no \
op monitor interval=15s timeout=40s \
op_params OCF_CHECK_LEVEL=0 \
op start interval=0s timeout=60s \
op stop interval=0s timeout=60s
primitive p_fs_nfsshare_exports_HA Filesystem \
params \
device="/dev/drbd1003" \
directory="/srv/drbd-nfs/exports/HA" \
fstype=ext4 \
run_fsck=no \
op monitor interval=15s timeout=40s \
op_params OCF_CHECK_LEVEL=0 \
op start interval=0s timeout=60s \
op stop interval=0s timeout=60s
primitive p_nfsserver nfsserver \
params \
nfs_shared_infodir="/srv/drbd-nfs/internal/nfs_info_dir" \
nfs_server_scope=172.16.16.102 \
nfs_ip=172.16.16.102 \
op monitor interval=10s timeout=20s \
op start interval=0s timeout=40s \
op stop interval=0s timeout=20s
primitive p_pb_block portblock \
params \
action=block \
ip=172.16.16.102 \
portno=2049 \
protocol=tcp
primitive p_pb_unblock portblock \
params \
action=unblock \
ip=172.16.16.102 \
portno=2049 \
tickle_dir="/srv/drbd-nfs/internal/portblock_tickle_dir" \
reset_local_on_unblock_stop=1 protocol=tcp \
op monitor interval=10s timeout=20s
ms ms_drbd_ha_nfs p_drbd_ha_nfs \
meta master-max=1 master-node-max=1 \
clone-node-max=1 clone-max=3 notify=true
group g_nfs p_pb_block p_virtip p_fs_nfsshare_exports_HA \
p_nfsserver p_expfs_nfsshare_exports_HA p_pb_unblock
colocation co_ha_nfs inf: \
g_nfs:Started ms_drbd_ha_nfs:Master
order o_ms_drbd_ha_nfs-before-g_nfs ms_drbd_ha_nfs:promote g_nfs:start
property cib-bootstrap-options: \
have-watchdog=false \
cluster-infrastructure=corosync \
cluster-name=nfscluster \
stonith-enabled=false
After you have finished editing the configuration, save the file and exit the editor by entering :x
(if your default editor is Vi or Vim). Next, commit the changes by entering the configure commit
command in the crm
shell. Pacemaker will then try to start the NFS server service on one of the nodes. Enter quit
to leave the crm
shell. Then enter and run the following command to clean up (restart in a way) cluster resources:
crm resource cleanup
Next, enter crm_mon
to see if everything is working fine.
With that, you have configured an NFS high availability cluster, and the NFS share is ready to be used by clients on your network. You can verify the availability of your NFS share by entering the command showmount -e 172.16.16.102
from any host that is on the 172.16.16.0/24
network.
Having DRBD and Pacemaker running on your cluster stack makes it so that if one cluster node fails, the other cluster node will take over seamlessly. This is because you have prepared redundant services and because DRBD ensures real-time, up-to-date data replication.
If you need more details or help from our experienced team, you can contact the experts at LINBIT.
Changelog
2022-03-09:
- Originally published.
2023-12-04:
- Added user submitted suggestions and LINBIT technical review.
2024-07-15:
- LINBIT technical review.
2024-09-23:
- Technical improvements made to HA architecture.
- Other improvements to technical details of instructions.