Configuring Multipath iSCSI Targets in Pacemaker for Higher Availability

LINBIT® has been building and supporting high-availability (HA) iSCSI clusters using DRBD® and Pacemaker for over a decade. In fact, that was the very first HA cluster I built for a client when I started working at LINBIT as a support engineer back in 2014. Searching the Internet for HA iSCSI Pacemaker clusters will return a lot of results, but none of them will show you how you can configure multipathing for iSCSI when using Pacemaker, which is why I’m writing this blog today.

❗ IMPORTANT: This blog is not describing using DRBD in dual-primary mode to multipath a client (initiator) system to separate DRBD nodes. You should never try to do that. This blog is describing multipathing between an initiator and the iSCSI cluster’s single DRBD primary node. I mention this because that subject comes up A LOT.

In case you’re uninitiated, iSCSI multipathing is a technology that enables redundant and load-balanced connections between an iSCSI initiator and an iSCSI target. It allows multiple network paths to be used simultaneously for data transmission, bolstering both fault tolerance and performance. An HA iSCSI cluster without multipathing can keep your data available when there are failures on the target servers, but this does nothing for a dead switch between the target and initiator, if that switch is the only connection between them.

The Pacemaker Configuration

A basic HA iSCSI Pacemaker configuration as outlined in LINBIT’s Highly Available iSCSI Storage With DRBD And Pacemaker On RHEL 8 how-to guide will have you configure a single virtual IP (VIP) address that floats between the peers that make up the HA cluster. A client can use the VIP to attach to the active node in the cluster. This basic configuration also uses a Pacemaker resource group to order and colocate all the different primitives needed to create an HA iSCSI target.

The main differences to note in a Pacemaker configuration that can support multipathing is a second (or n-th) VIP address, and all of the VIP addresses’ iSCSI sockets listed in the portals parameter on the iSCSITarget primitive. Optionally, “long form” resource colocation and location constraints can be used along with a resource set to logically “group” the VIP addresses. The resource set’s “grouping” of the VIP addresses will start the VIP addresses in parallel, but only require one to fully start before Pacemaker continues to start services according to their ordering constraints. I say the long form ordering is optional because you technically can put all your IP addresses into a resource group, but this will start the virtual IP addresses sequentially, which isn’t as efficient. Also, resource sets in Pacemaker are fairly niche, at least in my experience, so maybe this will help someone searching the internet for examples in both crmsh and pcs syntax.

📝 NOTE: Each of the multiple VIP addresses should exist on completely separate networks that do not share any single points of failure. In the case of a complete server failure, Pacemaker and DRBD will allow services to automatically failover to the peer server.

The network interfaces on the iSCSI cluster that will be used for the iSCSI initiator and target traffic, and where the VIP addresses will be assigned, are as follows:

# ip addr show enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:3f:44:ef brd ff:ff:ff:ff:ff:ff
    inet 192.168.222.32/24 brd 192.168.222.255 scope global noprefixroute enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe3f:44ef/64 scope link
       valid_lft forever preferred_lft forever
# ip addr show enp0s9
4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:d1:a5:bc brd ff:ff:ff:ff:ff:ff
    inet 192.168.221.32/24 brd 192.168.221.255 scope global noprefixroute enp0s9
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fed1:a5bc/64 scope link
       valid_lft forever preferred_lft forever

Here is the HA iSCSI cluster configuration that supports multipathing in crmsh syntax:

primitive p_drbd_r0 ocf:linbit:drbd \
    params drbd_resource=r0 \
    op start interval=0s timeout=240 \
    op promote interval=0s timeout=90 \
    op demote interval=0s timeout=90 \
    op stop interval=0s timeout=100 \
    op monitor interval=29 role=Master \
    op monitor interval=31 role=Slave
primitive p_iscsi_lun_0 iSCSILogicalUnit \
    params target_iqn="iqn.2017-01.com.linbit:drbd0" implementation=lio-t \
    scsi_sn=aaaaaaa0 lio_iblock=0 lun=0 path="/dev/drbd0" \
    op start interval=0 timeout=20 \
    op stop interval=0 timeout=20 \
    op monitor interval=20 timeout=40
primitive p_iscsi_portblock_off_0 portblock \
    params portno=3260 protocol=tcp action=unblock \
    op start timeout=20 interval=0 \
    op stop timeout=20 interval=0 \
    op monitor timeout=20 interval=20
primitive p_iscsi_portblock_on_0 portblock \
    params portno=3260 protocol=tcp action=block \
    op start timeout=20 interval=0 \
    op stop timeout=20 interval=0 \
    op monitor timeout=20 interval=20
primitive p_iscsi_target_0 iSCSITarget \
    params iqn="iqn.2017-01.com.linbit:drbd0" implementation=lio-t \
    portals="192.168.222.35:3260 192.168.221.35:3260" \
    op start interval=0 timeout=20 \
    op stop interval=0 timeout=20 \
    op monitor interval=20 timeout=40
primitive p_vip_0a IPaddr2 \
    params ip=192.168.222.35 cidr_netmask=24 \
    op start interval=0 timeout=20 \
    op stop interval=0 timeout=20 \
    op monitor interval=10s
primitive p_vip_0b IPaddr2 \
    params ip=192.168.221.35 cidr_netmask=24 \
    op start interval=0 timeout=20 \
    op stop interval=0 timeout=20 \
    op monitor interval=10s
ms ms_drbd_r0 p_drbd_r0 \
    meta master-max=1 master-node-max=1 notify=true clone-max=3 clone-node-max=1
colocation cl_p_iscsi_lun_0-with_p_iscsi_target_0 inf: p_iscsi_lun_0 p_iscsi_target_0
colocation cl_p_iscsi_portblock_off_0-with-p_iscsi_lun_0 inf: p_iscsi_portblock_off_0 p_iscsi_lun_0
colocation cl_p_iscsi_portblock_on_0-with-ms_drbd_r0 inf: p_iscsi_portblock_on_0 ms_drbd_r0:Master
colocation cl_p_iscsi_target_0-with-p_vip_0a inf: p_iscsi_target_0 [ p_vip_0a p_vip_0b ]
colocation cl_p_vips-with-p_iscsi_portblock_on_0 inf: [ p_vip_0a p_vip_0b ] p_iscsi_portblock_on_0
order o_ms_drbd_r0-before_p_iscsi_portblock_on_0 ms_drbd_r0:promote p_iscsi_portblock_on_0
order o_p_iscsi_lun_0-before_p_iscsi_portblock_off_0 p_iscsi_lun_0 p_iscsi_portblock_off_0
order o_p_iscsi_portblock_on_0-before_p_vip_0a p_iscsi_portblock_on_0 [ p_vip_0a p_vip_0b ]
order o_p_iscsi_target_0-before_p_iscsi_lun_0 p_iscsi_target_0 p_iscsi_lun_0
order o_p_vips-before-p_iscsi_target_0 [ p_vip_0a p_vip_0b ] p_iscsi_target_0

Or, if for some reason you need more angle brackets in your life, here is the same configuration in pcs XML syntax:

<cib [...]>
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <nvpair name="stop-all-resources" value="false" id="cib-bootstrap-options-stop-all-resources"/>
        <nvpair name="stonith-enabled" value="false" id="cib-bootstrap-options-stonith-enabled"/>
      </cluster_property_set>
    </crm_config>
    <nodes>
      <node id="3" uname="iscsi-2"/>

With the above configuration set in Pacemaker the cluster monitor will look like this:

[root@iscsi-2 ~]# crm_mon -1r
Cluster Summary:
  * Stack: corosync
  * Current DC: iscsi-2 (version 2.0.5.linbit-1.0.el8-ba59be712) - partition with quorum
  * Last updated: Tue Oct  3 18:46:03 2023
  * Last change:  Sat Sep 30 06:18:10 2023 by hacluster via crmd on iscsi-1
  * 3 nodes configured
  * 9 resource instances configured

Node List:
  * Online: [ iscsi-0 iscsi-1 iscsi-2 ]

Full List of Resources:
  * p_iscsi_lun_0       (ocf::heartbeat:iSCSILogicalUnit):       Started iscsi-2
  * p_iscsi_portblock_off_0     (ocf::heartbeat:portblock):      Started iscsi-2
  * p_iscsi_portblock_on_0      (ocf::heartbeat:portblock):      Started iscsi-2
  * p_iscsi_target_0    (ocf::heartbeat:iSCSITarget):    Started iscsi-2
  * p_vip_0a    (ocf::heartbeat:IPaddr2):        Started iscsi-2
  * p_vip_0b    (ocf::heartbeat:IPaddr2):        Started iscsi-2
  * Clone Set: ms_drbd_r0 [p_drbd_r0] (promotable):
    * Masters: [ iscsi-2 ]
    * Slaves: [ iscsi-0 iscsi-1 ]

Notice the iSCSI target is currently running on the host named iscsi-2. As shown in the Pacemaker configuration, the p_vip_0a and p_vip_0b VIP resources are configured with the IP addresses, 192.168.222.35/24 and 192.168.221.35/24, respectively. Those are the IP addresses the iSCSI target is listening on.

Inspecting the interfaces on iscsi-2 will show the VIP addresses assigned to their respective interfaces:

# ip addr show enp0s8
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:3f:44:ef brd ff:ff:ff:ff:ff:ff
    inet 192.168.222.32/24 brd 192.168.222.255 scope global noprefixroute enp0s8
       valid_lft forever preferred_lft forever
    inet 192.168.222.35/24 brd 192.168.222.255 scope global secondary enp0s8
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe3f:44ef/64 scope link
       valid_lft forever preferred_lft forever
# ip addr show enp0s9
4: enp0s9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:d1:a5:bc brd ff:ff:ff:ff:ff:ff
    inet 192.168.221.32/24 brd 192.168.221.255 scope global noprefixroute enp0s9
       valid_lft forever preferred_lft forever
    inet 192.168.221.35/24 brd 192.168.221.255 scope global secondary enp0s9
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fed1:a5bc/64 scope link
       valid_lft forever preferred_lft forever

Connecting an iSCSI Initiator to the Target Cluster

Connecting an iSCSI initiator to an iSCSI target using multipathing is as easy as connecting a non-multipathed initiator and target. As mentioned earlier, the iSCSI target and initiator systems should be connected to two or more networks that do not share components. The network interfaces configured on the initiator system look like this:

$ ip addr show eth1
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:68:17:3d brd ff:ff:ff:ff:ff:ff
    altname enp0s8
    inet 192.168.222.254/24 brd 192.168.222.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fe68:173d/64 scope link
       valid_lft forever preferred_lft forever
$ ip addr show eth2
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:c4:e0:19 brd ff:ff:ff:ff:ff:ff
    altname enp0s9
    inet 192.168.221.254/24 brd 192.168.221.255 scope global eth2
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:fec4:e019/64 scope link
       valid_lft forever preferred_lft forever
$ ip addr show eth3
5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 08:00:27:ea:11:61 brd ff:ff:ff:ff:ff:ff
    altname enp0s10
    inet 192.168.220.254/24 brd 192.168.220.255 scope global eth3
       valid_lft forever preferred_lft forever
    inet6 fe80::a00:27ff:feea:1161/64 scope linkhttps://crmsh.github.io/man-2.0/#topics_Features_Resourcesets
       valid_lft forever preferred_lft forever

Interfaces eth1 and eth2 are on the same networks as the iSCSI target cluster, and eth3 is yet another network that is used to access services hosted on the initiator system.

Performing an iSCSI discovery from the initiator against a single target VIP will show all available targets on all of the available paths:

$ sudo iscsiadm --mode discovery -t st -p 192.168.222.35
192.168.222.35:3260,1 iqn.2017-01.com.linbit:drbd0
192.168.221.35:3260,1 iqn.2017-01.com.linbit:drbd0

Logging into both targets by using the VIP address and port (socket) shown in the discovery output will connect the initiator to the targets over multiple paths:

$ sudo iscsiadm --mode node -T iqn.2017-01.com.linbit:drbd0 -p 192.168.222.35:3260 -l
Logging in to [iface: default, target: iqn.2017-01.com.linbit:drbd0, portal: 192.168.222.35,3260]
Login to [iface: default, target: iqn.2017-01.com.linbit:drbd0, portal: 192.168.222.35,3260] successful.

$ sudo iscsiadm --mode node -T iqn.2017-01.com.linbit:drbd0 -p 192.168.221.35:3260 -l
Logging in to [iface: default, target: iqn.2017-01.com.linbit:drbd0, portal: 192.168.221.35,3260]
Login to [iface: default, target: iqn.2017-01.com.linbit:drbd0, portal: 192.168.221.35,3260] successful.

Next, you can verify that multipathing is working by using multipath -ll command:

$ sudo multipath -ll
mpatha (36001405aaaaaaa000000000000000000) dm-1 LIO-ORG,p_iscsi_lun_0
size=8.0G features='0' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=enabled
| `- 3:0:0:0 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=50 status=enabled
  `- 4:0:0:0 sdc 8:32 active ready running

On the target system where the iSCSI target is currently running, you can look at the TCP sessions and see an established session on each of the VIP addresses from the initiator’s respective IP addresses.

# ss -tn
State        Recv-Q        Send-Q                  Local Address:Port                    Peer Address:Port         Process
ESTAB        0             0                      192.168.222.35:3260                 192.168.222.254:51856
ESTAB        0             0                      192.168.221.35:3260                 192.168.221.254:46682

From here the multipath device can be used on the initiator system through device mapper. The device mapper name in this example, which can seen in the multipath -ll output above, is mpatha and can be used as /dev/mapper/mpatha or configured further depending on your needs. When a single path fails between the target and initiator, such as a switch or network interface card (NIC) failure, the initiator system will be able to continue reading and writing from the target cluster. If the active node in the iSCSI target cluster fails, the iSCSI target will seamlessly fail over to another node in the cluster without interruption to reads or writes from the initiator, thanks to Pacemaker and DRBD.

The Testing Environment

Testing was done using a MinIO server on the iSCSI initiator. Link failures between the iSCSI initiator and target were simulated by using a script to “unplug” and “plug in” the network cables one at a time from the initiator system on the system’s hypervisor. Uploads of a large ISO image were looped from another system to a MinIO bucket backed by the iSCSI cluster’s HA iSCSI target volume. When the network cable was “unplugged” between the iSCSI initiator and target cluster, a brief decrease in throughput on the ISO upload occurred before multipathing would mark the path as faulty allowing all I/O to continue over the single remaining path.

Closing Thoughts

The upside of using multipathing between an iSCSI target and initiator whenever possible should be pretty apparent. When combining multipathing with DRBD and Pacemaker, even higher availability can be achieved than when using just one or the other.

Whether you stumbled across this blog while looking for resources specific to HA iSCSI multipathing in Linux or some smaller tidbit of information within it, I hope you did find it helpful. If you happen to be building a storage system using DRBD and need some pointers from the creators, don’t hesitate to reach out directly, or consider joining the LINBIT Slack community where you can share any thoughts or questions with me and other users of LINBIT’s open source clustering software.

Matt Kereczman

Matt Kereczman

Matt Kereczman is a Solutions Architect at LINBIT with a long history of Linux System Administration and Linux System Engineering. Matt is a cornerstone in LINBIT's technical team, and plays an important role in making LINBIT and LINBIT's customer's solutions great. Matt was President of the GNU/Linux Club at Northampton Area Community College prior to graduating with Honors from Pennsylvania College of Technology with a BS in Information Security. Open Source Software and Hardware are at the core of most of Matt's hobbies.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.