Create Highly Available NFS Targets with DRBD and Pacemaker
This blog post explains how to configure a highly available (HA) active/passive NFS server on a two-node Linux cluster using DRBD and Pacemaker. NFS is preferred for many use cases because it:
- Enables multiple computers to access the same files, so everyone on the network can use the same data.
- Reduces storage costs by having computers share applications instead of needing local disk space for each user application.
This use case requires that your system include the following components:
- Two diskfull nodes for data replication, one diskless node for quorum purposes.
- A separate network link for the replication (This is best practice, not mandatory.)
- A virtual IP address, required for the NFS server (An IP address in the 172.16.16.0/24 subnet is used here.)
- Pacemaker and resource agents are installed on all nodes and enabled to start as a service.
- The latest version of DRBD is installed on all nodes and loaded into kernel (Available from Github, or through LINBIT customer repositories. See the DRBD 9.0 User’s Guide for more details.)
- An NFS server is installed on all nodes (not enabled to start as Pacemaker will start the server when necessary)
- All cluster nodes can resolve each other’s hostnames (Check /etc/hosts or your local DNS server.)
- SELinux and any firewalls in use are configured to allow appropriate traffic on all nodes (Please check the DRBD 9.0 User’s Guide for more information.)
- The `crmsh` and `pcs` CLI utilities are installed on all nodes (for editing and managing the Pacemaker configuration)
After completing these initial preparation steps, you can create your cluster.
Create a Logical Volume and Directory for the NFS Share
Before creating DRBD resources in the cluster, you need to create a physical volume on top of the physical device (drive). These commands should be entered as ‘root’ user or else prefaced with `sudo.`
To do that, enter:
pvcreate /dev/sdx
where “x” in “sdx” corresponds to the letter identifying your physical device.
Then create the volume group, named ‘ha_vg,’ by entering:
vgcreate ha_vg /dev/sdx
Next, create the logical volume that DRBD will consume. You can replace the “300G” with a size appropriate for your use.
lvcreate -L 300G -n ha_HA_lv ha_vg
After creating the logical volume, create the directory that will serve as the NFS share and recursively give the directory appropriate access permissions.
mkdir -p /nfsshare/exports/HA chmod 777 -R /nfsshare
Create a DRBD Resource File
DRBD resource configuration files are located in the */etc/drbd.d/* directory. Resource files need to be created on all cluster nodes. You can create a resource file on one node and then use the `rsync` command to distribute the file to other nodes. Each DRBD resource defined in the resource configuration file needs a different TCP port. There is only one defined resource in this configuration, so the configuration just uses one TCP port (7003) here. Use the text editor of your choice to create the DRBD resource file as shown below. Change the host names and IP addresses to reflect your network configuration. Please note that the third cluster node only serves a quorum function in the cluster. It is not involved in DRBD replication. In the configuration, it is identified as “diskless”.
vi /etc/drbd.d/ha_HA_lv.res resource ha_HA_lv { device "/dev/drbd1003"; disk "/dev/ha_vg/ha_HA_lv"; meta-disk internal; options { on-no-quorum suspend-io; quorum majority; } net { protocol C; timeout 10; ko-count 1; ping-int 1; } connection-mesh { hosts "drbd1" "drbd2" "drbd3"; } on "drbd1" { address 172.16.16.111:7003; node-id 0; } on "drbd2" { address 172.16.16.112:7003; node-id 1; } on "drbd3" { disk none; address 172.16.16.113:7003; node-id 2; } }
Initialize DRBD Resources
After creating the DRBD resource configuration file, you need to initialize DRBD resources. To do that, enter the following commands as ‘root’ user or use `sudo.` The first two commands must be entered and run on both cluster nodes.
drbdadm create-md ha_HA_lv drbdadm up ha_HA_lv
As this is a new filesystem with no data content, you can skip the initial synchronization. However, use caution with this command because it can destroy any existing data on the logical volume.
drbdadm new-current-uuid --clear-bitmap ha_HA_lv/0
Enter and run the following command on only one of the two cluster nodes. This command forces the node to become primary and creates the filesystem. Then, DRBD will replicate the filesystem to the other cluster node.
drbdadm primary --force ha_HA_lv mkfs.ext4 /dev/drbd1003
Now check the `drbdadm status` and `lsblk` commands.
`drbdadm status` should show that DRBD is in sync and “UpToDate”. If everything looks fine, use the following command to change primary DRBD to secondary.
drbdadm secondary ha_HA_lv
Create NFS Exports and Pacemaker Resources
There are two ways to create Pacemaker resources. The first way is by directly editing the Pacemaker configuration file using the interactive CRM Shell. The second way is by using the `pcs` command-line tool. In this example, we’ll use the CRM Shell to edit the Pacemaker configuration.
To enter the CRM Shell, enter `crm conf edit`, then press “i” to enter editing mode.
In this mode, delete everything in the configuration file and paste in the configuration below. Please do not forget to change hostnames, subnets, and IP addresses to match your network configuration.
node 1: drbd1 node 2: drbd2 node 3: drbd3 primitive p_drbd_attr ocf:linbit:drbd-attr primitive p_HA_lv_nfs ocf:linbit:drbd \ params drbd_resource=ha_HA_lv \ op monitor interval=11s timeout=20s role=Master \ op monitor interval=13s timeout=20s role=Slave primitive p_nfs_HA_fs Filesystem \ params device="/dev/drbd1003" directory="/nfsshare/exports/HA" \ fstype=ext4 run_fsck=no \ op monitor interval=15 timeout=40 \ op start timeout=40 interval=0 \ op stop timeout=40 interval=0 primitive p_nfs_HA_exp exportfs \ params fsid=10003 unlock_on_stop=1 options=rw directory="/nfsshare/exports/HA" \ clientspec="172.16.16.0/24" \ op monitor interval=15 timeout=40 \ op start timeout=40 interval=0 \ op stop timeout=40 interval=0 primitive p_nfs_nfs_ip IPaddr2 \ params ip=172.16.16.102 cidr_netmask=32 \ op monitor interval=15 timeout=40 \ op start timeout=40 interval=0 \ op stop timeout=40 interval=0 primitive p_nfs_server nfsserver primitive pb_b portblock \ params action=block ip=172.16.16.102 portno=2049 protocol=tcp primitive pb_u portblock \ params action=unblock ip=172.16.16.102 portno=2049 \ tickle_dir="/srv/drbd-nfs/nfstest/.tickle" \ reset_local_on_unblock_stop=1 protocol=tcp \ op monitor interval=10s timeout=20s ms ha_HA_lv_clone p_HA_lv_nfs \ meta clone-max=3 notify=true master-max=1 clone c_drbd_attr p_drbd_attr colocation co_nfs_nfstest inf: pb_b p_nfs_nfs_ip ha_HA_lv_clone:Master \ p_nfs_server p_nfs_HA_fs p_nfs_HA_exp pb_u location lo_nfs_nfstest { p_nfs_nfstest_fs } resource-discovery=never \ rule -inf: #uname ne drbd1 and #uname ne drbd2 and #uname ne drbd3 order o_nfs_nfstest pb_b p_nfs_nfs_ip ha_HA_lv_clone:promote p_nfs_server \ p_nfs_HA_fs p_nfs_HA_exp pb_u property cib-bootstrap-options: \ have-watchdog=false \ cluster-infrastructure=corosync \ cluster-name=nfscluster \ stonith-enabled=false
After you have finished editing the configuration, save the file. Next, commit the changes by entering the `commit` command in the CRM Shell. Pacemaker will then try to start the NFS server service on one of the nodes. Enter `exit` to leave the CRM Shell. Then enter and run the following command to cleanup (restart in a way) cluster resources.
pcs resource cleanup
Next, enter `pcs status` to see if everything is working fine.
With that, you have configured an NFS HA cluster, and the NFS share is ready to be used by clients on your network. Having DRBD and Pacemaker running on your cluster stack ensures that should one cluster node fail, the other cluster node will take over with an up-to-date copy of your data.
If you need more details and help from our experienced team, please contact the experts at LINBIT Support.