Distributing Volume Replicas for High Availability & Disaster Recovery Using LINSTOR

LINSTOR® has supported using auxiliary node properties to fine tune LINSTOR’s logic for where in the cluster LINSTOR should place DRBD® volume replicas when you don’t explicitly name nodes. The auxiliary properties, also known as “AuxProps”, are key value pairs placed on nodes in a LINSTOR cluster. When you combine auxiliary node properties with LINSTOR options such as --replcas-on-same and --replicas-on-different, you can influence the automatic placement strategies of volume replicas for your DRBD devices.

Depending on the topology of a LINSTOR cluster and an organization’s high availability (HA) and disaster recovery (DR) requirements, these settings might play a crucial role in whether or not the cluster’s replication is in compliance with policies that might govern those practices.

A colleague, Ryan Ronnander, explained how you could use LINSTOR’s auxiliary node properties to configure automatic placement of replicas spanning multiple Availability Zones in a blog post using the aforementioned --replicas-on-different option. Ryan’s blog is a must read if you’re unfamiliar with the aforementioned LINSTOR options, and will help to understand the new option we’re about to delve into.

This blog post introduces a LINSTOR option, --x-replicas-on-different, that expands LINSTOR’s automatic placement options to give you more flexibility and control. This new option allows users to effectively tell LINSTOR, “I want $X replicas of a resource created in each different $Y“, where $Y is a key value pair you’ve assigned to your LINSTOR satellite nodes. This is useful when you’re trying to support both HA and DR strategies for a storage resource, for example, by creating two replicas in each data center (“X = 2″ and “Y = ‘data center'”).

Learning Through Examples

I believe the option, --x-replicas-on-different, is easier to both describe and understand through example. I will use this section to demonstrate a practical scenario, showing the LINSTOR commands used and their resulting DRBD resource replica distribution, which helps describe this new option.

Using a 5-satellite LINSTOR cluster, with satellites named linstor-sat-0 through linstor-sat-4 spread between two data centers in two different cities, I applied an auxiliary node property to each using site as the key and the name of the city where the node is located as the value:

linstor node set-property --aux linstor-sat-0 site portland
linstor node set-property --aux linstor-sat-1 site portland
linstor node set-property --aux linstor-sat-2 site vienna
linstor node set-property --aux linstor-sat-3 site vienna
linstor node set-property --aux linstor-sat-4 site vienna

The commands above result in the node list below, with the Aux/site key set to the name of the city where the node is located:

linstor node list --show-aux-props
╭─────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node           ┊ NodeType   ┊ Addresses                    ┊ AuxProps          ┊ State  ┊
╞═════════════════════════════════════════════════════════════════════════════════════════╡
┊ linstor-ctrl-0 ┊ CONTROLLER ┊ 192.168.222.250:3370 (PLAIN) ┊                   ┊ Online ┊
┊ linstor-sat-0  ┊ SATELLITE  ┊ 192.168.222.10:3366 (PLAIN)  ┊ Aux/site=portland ┊ Online ┊
┊ linstor-sat-1  ┊ SATELLITE  ┊ 192.168.222.11:3366 (PLAIN)  ┊ Aux/site=portland ┊ Online ┊
┊ linstor-sat-2  ┊ SATELLITE  ┊ 192.168.222.12:3366 (PLAIN)  ┊ Aux/site=vienna   ┊ Online ┊
┊ linstor-sat-3  ┊ SATELLITE  ┊ 192.168.222.13:3366 (PLAIN)  ┊ Aux/site=vienna   ┊ Online ┊
┊ linstor-sat-4  ┊ SATELLITE  ┊ 192.168.222.14:3366 (PLAIN)  ┊ Aux/site=vienna   ┊ Online ┊
╰─────────────────────────────────────────────────────────────────────────────────────────╯

With these settings in place, I can now use the --x-replicas-on-different option while creating resources from my rg0 resource group to influence where LINSTOR places the resulting resource’s DRBD replicas. In the following command examples I will use LINSTOR’s rg spawn subcomamnd, which is short hand for resource-group spawn-resources, to both save space and help direct attention to the new options. In each command, the --x-replicas-on-different option takes two positional arguments, first the name of the “AuxProp” (in this case, site) and second the number of replicas to place in each different site. The remaining arguments passed to the rg spawn subcommand are --place-count X, where X is the total number of replicas for the new resource, followed by the resource group’s name (rg0), the new resource’s name (resX), and the size of the resource (200M will be used in all examples).

I will start with a relatively straightforward example where four “diskful” replicas are created in the cluster, while specifying that two replicas should be placed in each differently named data center:

linstor rg spawn --x-replicas-on-different site 2 --place-count 4 rg0 res0 200M

This results in four “diskful” replicas, two on nodes in Portland and two on nodes in Vienna. Also, notice that LINSTOR automatically assigned a diskless TieBreaker resource to achieve quorum:

linstor resource list --resource res0
╭─────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node          ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════╡
┊ res0         ┊ linstor-sat-0 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-05-30 18:36:22 ┊
┊ res0         ┊ linstor-sat-1 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-05-30 18:36:21 ┊
┊ res0         ┊ linstor-sat-2 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-05-30 18:36:21 ┊
┊ res0         ┊ linstor-sat-3 ┊ 7000 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2024-05-30 18:36:15 ┊
┊ res0         ┊ linstor-sat-4 ┊ 7000 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-05-30 18:36:21 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────╯

What could have happened in this example, but did not because of the --x-replicas-on-different site 2 option, is three diskful replicas could have been placed in Vienna with only one replica existing in Portland. This ensures there is not only data center-level fault tolerance, but also node-level fault tolerance within each data center for resources created using these options.

Another example, albeit one that is redundant to the --replicas-on-same LINSTOR option, is creating two diskful replicas while specifying that two replicas should be placed in each differently named data center:

linstor rg spawn --x-replicas-on-different site 2 --place-count 2 rg0 res1 200M

This results in two diskful replicas being placed onto nodes within the same data center, and the diskless quorum TieBreaker resource landing on any other node in the cluster:

linstor resource list --resource res1
╭─────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node          ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════╡
┊ res1         ┊ linstor-sat-0 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-05-30 18:45:39 ┊
┊ res1         ┊ linstor-sat-1 ┊ 7001 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-05-30 18:45:39 ┊
┊ res1         ┊ linstor-sat-2 ┊ 7001 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2024-05-30 18:45:38 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────╯

Again, this is redundant to the --replicas-on-same option, but the example demonstrates what happens when a --x-replicas-on-different value is equal to the total number of diskful replicas specified by --place-count.

An example that might not be as straightforward, is what happens when a --place-count is specified that is not divisible by the value set for --x-replicas-on-different. For example:

linstor rg spawn --x-replicas-on-different site 2 --place-count 3 rg0 res2 200M

Rather than refusing to create the resource, LINSTOR will respect the --x-replicas-on-different option up to the point where you’d have a remainder replica assignment, at which point it will fill out the “next different site” with the remaining replicas. After running the example command above, the following replica assignments are made:

linstor resource list --resource res2
╭───────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node          ┊ Port ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════╡
┊ res2         ┊ linstor-sat-0 ┊ 7002 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2024-05-30 19:40:26 ┊
┊ res2         ┊ linstor-sat-1 ┊ 7002 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2024-05-30 19:40:26 ┊
┊ res2         ┊ linstor-sat-3 ┊ 7002 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2024-05-30 19:40:26 ┊
╰───────────────────────────────────────────────────────────────────────────────────────╯

Another example with an outcome that might not be obvious, is what happens when --place-count is less than the value of --x-replicas-on-different:

linstor rg spawn --x-replicas-on-different site 3 --place-count 2 rg0 res3 200M
linstor resource list --resource res3
╭─────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node          ┊ Port ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════╡
┊ res3         ┊ linstor-sat-0 ┊ 7003 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-05-30 19:52:46 ┊
┊ res3         ┊ linstor-sat-1 ┊ 7003 ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2024-05-30 19:52:46 ┊
┊ res3         ┊ linstor-sat-2 ┊ 7003 ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2024-05-30 19:52:45 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────╯

Perhaps the result above is not that surprising considering the previous example, because again we see that LINSTOR will fill out a site with replicas until it reaches the value set using --x-replicas-on-different.

Finally, what happens when LINSTOR is asked for something impossible? Spoiler alert, it doesn’t create a black hole consuming the entire solar system or summon Godzilla from the depths of the Pacific to wreak havoc on cities specified in your AuxProps. Sorry, Hollywood. Instead, it politely refuses.

For example, the site with the most nodes in the example cluster is Vienna, with three nodes. When trying to specify --x-replicas-on-different 4, and a --place-count 4, output similar to the following is displayed:

linstor rg spawn --x-replicas-on-different site 4 --place-count 4 rg0 res4 200M
ERROR:
Description:
    Not enough available nodes
Details:
    Not enough nodes fulfilling the following auto-place criteria:
     * has a deployed storage pool named [drbdpool]
     * the storage pools have to have at least '204800' free space
     * the current access context has enough privileges to use the node and the storage pool
     * the node is online

    Auto-place configuration details:
      Replica count: 4
      Don't place with resource
          res4
      Storage pool name:
          drbdpool
      X replicas on nodes with different property:
          Aux/site: 4

    Resource group: rg0

The above error lists the possible reasons auto placement did not succeed.

Closing Thoughts

This blog has hopefully shed some light on LINSTOR’s --x-replicas-on-different option, and maybe even pointed out some of LINSTOR’s lesser-known automatic placement options. The --x-replicas-on-different option essentially tells LINSTOR’s auto-placer to choose the same site, where “site” is the arbitrary key value pair used in my example, X times before considering others. One might consider combining the --x-replicas-on-different setting with other auto-placer settings like --replicas-on-different, for example, to place two replicas in both Portland and Vienna but also ensure the replicas in each data center land on different racks.

The example in this blog, using sites, is the scenario our customer base had in mind for --x-replicas-on-different, but perhaps there are other uses we’ve yet to realize. If there is an example scenario I missed, or a use case you think could be interesting for these options, reach out and let us know!

Matt Kereczman

Matt Kereczman

Matt Kereczman is a Solutions Architect at LINBIT with a long history of Linux System Administration and Linux System Engineering. Matt is a cornerstone in LINBIT's technical team, and plays an important role in making LINBIT and LINBIT's customer's solutions great. Matt was President of the GNU/Linux Club at Northampton Area Community College prior to graduating with Honors from Pennsylvania College of Technology with a BS in Information Security. Open Source Software and Hardware are at the core of most of Matt's hobbies.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.