*UPDATE MAY 2022: This article was originally published back in 2018. Three years before our DRBD-Reactor project. With the release of the DRBD-Reactor software we now suggest using that to make a Highly available LINSTOR® Controller. Instructions for this can be found in our User’s Guide. However, using Pacemaker is still a valid option. So this post will remain listed at present.
Part of the design of LINSTOR is that if the central LINSTOR Controller goes down, all the storage still remains up and accessible. This should allow ample time to repair the downed system hosting the LINSTOR Controller. Still, in the majority of cases, it is preferred to run the LINSTOR Controller in a container within your cloud or as a VM in your hypervisor platform. However, there may exist a situation where you want to keep the LINSTOR Controller up and highly available, but do not have a container or VM platform in place to rely upon. For situations like this we can easily leverage DRBD® and the Pacemaker/Corosync stack.
If familiar with Pacemaker, setting up a clustered LINSTOR Controller should seem pretty straightforward. The only really tricky bit here is that we first need to install LINSTOR to create the DRBD storage that will provide the storage for LINSTOR. Sounds a little bit chicken-and-egg, I know, but this allows LINSTOR to be aware of, and manage, all DRBD resources.
The below example is for only two nodes, but it could be easily adapted for more nodes. Make sure to install both the LINSTOR Controller and LINSTOR Satellite software to both nodes. The below instructions are by no means a step-by-step guide, but rather just the “special sauce” needed for a HA LINSTOR Controller cluster.
If using the LINBIT® provided package repositories an Ansible playbook is available to entirely automate the deployment of this cluster on a RHEL7 or CentOS7 system.
Create a DRBD resource for the LINSTOR database
We’ll name this resource linstordb, and use the already already configured pool0 storage pool.
[root@linstora ~]# linstor resource-definition create linstordb [root@linstora ~]# linstor resource-definition drbd-options --on-no-quorum=io-error linstordb [root@linstora ~]# linstor resource-definition drbd-options --auto-promote=no linstordb [root@linstora ~]# linstor volume-definition create linstordb 250M [root@linstora ~]# linstor resource create linstora linstordb --storage-pool pool0 [root@linstora ~]# linstor resource create linstorb linstordb --storage-pool pool0
Stop the LINSTOR Controller and move the database to the DRBD device
Move the database temporarily, mount the DRBD device where LINSTOR expects the database, and move it back. Make sure to execute the `chattr +i /var/lib/linstor` on all nodes that may potentially run a linstor-controller.
[root@linstora ~]# systemctl stop linstor-controller [root@linstora ~]# rsync -avp /var/lib/linstor /tmp/ [root@linstora ~]# drbdadm primary linstordrb [root@linstora ~]# mkfs.xfs /dev/drbd/by-res/linstordb/0 [root@linstora ~]# rm -rf /var/lib/linstor/* [root@linstora ~]# chattr +i /var/lib/linstor # only if on LINSTOR >= 1.14.0 [root@linstora ~]# mount /dev/drbd/by-res/linstordb/0 /var/lib/linstor [root@linstora ~]# rsync -avp /tmp/linstor/ /var/lib/linstor/ [root@linstora ~]# drbdadm secondary linstordb [root@linstora ~]# umount /dev/drbd/by-res/linstordb/0
Cluster everything up in Pacemaker
Please note that we strongly encourage you utilize tested and working STONITH in all Pacemaker cluster. This example omits it simply because these VMs did not have any fencing devices available.
primitive p_drbd_linstordb ocf:linbit:drbd \ params drbd_resource=linstordb \ op monitor interval=29 role=Master \ op monitor interval=30 role=Slave \ op start interval=0 timeout=240s \ op stop interval=0 timeout=100s primitive p_fs_linstordb Filesystem \ params device="/dev/drbd/by-res/linstordb/0" directory="/var/lib/linstor" \ op start interval=0 timeout=60s \ op stop interval=0 timeout=100s \ op monitor interval=20s timeout=40s primitive p_linstor-controller systemd:linstor-controller \ op start interval=0 timeout=100s \ op stop interval=0 timeout=100s \ op monitor interval=30s timeout=100s ms ms_drbd_linstordb p_drbd_linstordb \ meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true group g_linstor p_fs_linstordb p_linstor-controller order o_drbd_before_linstor inf: ms_drbd_linstordb:promote g_linstor:start colocation c_linstor_with_drbd inf: g_linstor ms_drbd_linstordb:Master property cib-bootstrap-options: \ stonith-enabled=false \ no-quorum-policy=ignore
We still usually advise leveraging the features already built into your cloud or VM platform for high availability if one is available, but if not, you can always use the above to leverage pacemaker to make your LINSTOR Controller highly available.