In this article, I will explore the LINSTOR® --x-replicas-on-different
feature, and my efforts to document the feature for the LINSTOR User Guide. This was an incredible journey. An alternative title to this article might have been “Documentation Adventures in LINSTOR-land”.
The --x-replicas-on-different
feature, introduced with LINSTOR server version 1.28.0 and LINSTOR client version 1.23.0, expands on the automatic resource placement options available to LINSTOR users. By using it, you can ensure that storage resource redundancy meets your or your users’ high availability (HA) and disaster recovery (DR) requirements across multiple data centers, for example. My colleague, Solutions Architect, Matt Kereczman, wrote about this LINSTOR feature in another article, “Distributing Volume Replicas for High Availability & Disaster Recovery Using LINSTOR”. In the article, Kereczman provides conceptual background to the --x-replicas-on-different
feature and practical demonstration of how you can use the feature, while showing a few different scenarios and resource placement outcomes.
Taking a general approach to documenting the LINSTOR feature
While working on the user guide documentation for the --x-replicas-on-different
feature, my head began to slightly spin thinking about how to take a more general approach documenting the feature than in Kereczman’s more practically-focused article.
For my counting, if you have an auxiliary property with two different values, perhaps representing two different data centers, for example, then there are four positive integer variables at play that directly relate to how LINSTOR places resources, when you use the --x-replicas-on-different
command argument when you create resources. These variables are:
- The number value that you specify with the
--x-replicas-on-different <Aux/prop> <positive-integer>
argument1. - The number value that you specify with the placement count
--place-count
argument. - The number of nodes having
value1
for the applicable auxiliary property, for example, the number ofportland
nodes in Kereczman’s article. - The number of nodes having
value2
for the applicable auxiliary property, for example, the number ofvienna
nodes in Kereczman’s article.
Implied by the last two variables is a value that is the total number of LINSTOR satellite nodes on which you have set the auxiliary property applicable to your --x-replicas-on-different
argument.
📝 NOTE: To keep things simple, I am only thinking about two variables related to a specified auxiliary property. It is possible of course to have more than two variables, for example, if you use an auxiliary property to label nodes in different data centers and you have more than two data centers.
Furthermore, there are three discrete possible relationships between these number values: greater than, less than, or equal to. This is the point at which I began to feel some tingling in my head that for me is a typical precursor to it spinning.
Kereczman’s article shows the resource placement outcome of five scenarios. The tingling sensation I was feeling was due to thinking that there must be more than five possible scenarios, but I was unsure how many.
If you are interested in math, I will make a digression here to discuss how I calculated just how many possible scenarios there are for the relationships between the four variables. If you are only interested in the results and want to skip the mathematical digression, you can continue reading from the “Testing automatic resource placement by using LINSTOR” section.
Calculating the number of combinations
To try to slow my head spinning, I knew that some of the potential combinations of elements and relationships would be nonsensical. For example, although it might happen by mistake, a systems administrator would not knowingly specify an --x-replicas-on-different
value greater than both the number of nodes having value1
and value2
for the applicable auxiliary property.
My head was still spinning enough for me to guess that there was still a large number of possible scenarios and that I did not know yet how to calculate how many. Here, I will abstract the variables to letter names: A, B, C, D:
- A: The positive integer value specified with
--x-replicas-on-different
- B: The positive integer specified with
--place-count
- C: The number of nodes having one value for a given auxiliary property, for example,
dc1
, for an auxiliary property,site
- D: The number of nodes having a different value for the same auxiliary property as C, for example,
dc2
Next, I started to investigate the number of possible scenarios in a general way.
I dabble in some shell scripting but I am hardly a programmer. However, I had the thought that this amazingly sophisticated calculator that I am writing this article with might also help me make some calculations. To that end, I thought I might express the set of possible combinations of relationships between the four variables as a Bash array variable.
combos=(AB AC AD \
BC BD \
CD \
)
This set might appear limited but because of mathematical symmetry, AB
is the same as BA
. That is, the three relationships under consideration for AB
, expressed as A=B
, A>B
, and A<B
, have equivalent expressions for BA
: B=A
, B<A
, and B>A
. For this reason, BA
, CA
, DA
, CB
, and others, do not need to appear as elements in the array variable.
I was not sure how or if I would use it but it seemed vaguely productive to enter something else into my shell so I defined another array variable representing the three possible relationships between two elements.
rels=('<' '=' '>')
Here, I realized that in my declaration of the combos
array variable, there was not room to express the relationship between the sub-elements (A, B, C, and D) if I might need that later. So just in case, I declared another array variable.
combosx=('AxB' 'AxC' 'AxD' \
'BxC' 'BxD' \
'CxD' \
)
At this point, I tried to express my calculation problem in words:
Given an array variable
combosx
, with six elements('AxB' 'AxC' 'AxD' 'BxC' 'BxD' 'CxD')
, what is the total number of unique replacement combinations forcombosx
, wherex
can be replaced by one of three elements within another array variable,rels=('<' '=' '>')
?
Here, there was a fork in the road for me. I was fairly confident that I had expressed my calculation problem clearly in words. Should I next turn to an AI chat engine for a solution or continue using my brain to try to solve this without help?
The temptation was too great and so I opened a web browser and asked some AI chat engines. Three different AI chat engines gave me an answer of 729. The formula each used to get the answer was this:
Total Combinations = (Number of choices for x
)(Number of elements in combosx
) = 36 = 729
Because I am wary of AI chat engine answers, and not trained in combinatorics, I wanted to try to confirm the answer myself. I wondered if I could do this by using the Bash shell.
📝 NOTE: Here again, a reader familiar with combinatorics or interested more in learning about LINSTOR than about the adventures of a non-programmer, non-mathematician, might skip ahead to a later section. I promise that eventually I do return to the original problem and topic of this article: LINSTOR automatic resource placement behavior when using the
--x-replicas-on-different
option.
What if…
I could somehow display the array variable, combosx
, while systematically replacing each x
with one of the three elements from my variable rels
, give an MD5 checksum to the result of each iteration, collect those checksums, and stop the replacement iteration when a new checksum matched a previously generated checksum.
I tried to turn this into an AI chat engine prompt:
Given an array variable
combosx
, with six elements('AxB' 'AxC' 'AxD' 'BxC' 'BxD' 'CxD')
, and another array variable,rels=('<' '=' '>')
, can you write a Bash script that will systematically replace thex
in each element ofcombosx
with one of the elements of therels
array variable? With each replacement iteration create an array with six elements. Some examples of these arrays would be('A<B' 'A<C' 'A<D' 'B<C' 'B<D' 'C<D')
,('A=B' 'A<C' 'A<D' 'B<C' 'B<D' 'C<D')
,('A<B' 'A=C' 'A<D' 'B<C' 'B<D' 'C<D')
, and so on.
This resulted in a script that did not produce the intended result so I added this additional prompt to the chat:
In Bash, how can I systematically generate a sequence of strings:
('A<B' 'A<C' 'A<D' 'B<C' 'B<D' 'C<D')
,('A=B' 'A<C' 'A<D' 'B<C' 'B<D' 'C<D')
,('A<B' 'A=C' 'A<D' 'B<C' 'B<D' 'C<D')
, and so on, cycling through possible operators<
,=
, and>
?
The AI chat engine happily obliged my request. I had low expectations when I ran the AI-generated Bash script. However, to my surprise was the following command and output:
./ai-script.bash | wc -l
729
Examining the script though, I noticed these lines:
num_combinations=${#combinations[@]}
[...]
for ((i = 0; i < $((${#operators[@]} ** num_combinations)); i++)); do
Here, the array variable combinations
used in the AI-generated Bash script was equivalent to my combosx
variable. The AI chat engine had taken some liberties with variable names when creating the script. In this case, $((${#operators[@]} ** num_combinations))
would be the number of operators, three, raised to the power of the number of elements in the combinations
array variable, six. Three raised to the sixth power is 729. Perhaps the script was only self-fulfilling the AI chat engine’s formula for the number of combinations then.
For some further verification, I made some modifications to the script, including adding checksum generating, sending outputs to files, and adding a way to iterate the original for
loop more than 729 times.
Here is the modified script:
#!/bin/bash
# Define the base letters
letters=('A' 'B' 'C' 'D')
# Define the operators
operators=('<' '=' '>')
# Function to generate the combinations (A<B, A<C, etc.)
generate_combinations() {
local -n result_array=$1 # Make it work with local variables
local index=0
for ((i = 0; i < ${#letters[@]} - 1; i++)); do
for ((j = i + 1; j < ${#letters[@]}; j++)); do
result_array[$index]="${letters[i]}x${letters[j]}"
((index++))
done
done
}
# Initialize the combinations array
declare -a combinations
generate_combinations combinations
if [ -z "$1" ]
then
num_combinations=${#combinations[@]}
else
num_combinations=$1
fi
# Loop through all possible sequences of operators
for ((i = 0; i < $((${#operators[@]} ** num_combinations)); i++)); do
# Create a new array for the current sequence
declare -a current_array
# Calculate the operator sequence for this iteration
temp=$i
for ((j = 0; j < num_combinations; j++)); do
operator_index=$((temp % ${#operators[@]})) # Find the correct index
operator="${operators[$operator_index]}" # Get the operator
temp=$((temp / ${#operators[@]})) # Continue the counter
# Replace 'x' in the corresponding combination with the operator
current_array[$j]=$(echo "${combinations[$j]}" | sed "s/x/$operator/g")
done
# Print the current array
echo "${current_array[@]}" |sed -e 's/[[:space:]]*$//g'|\
tee -a out.combos."$num_combinations" |\
md5sum >> out.md5sum."$num_combinations"
echo -n '.'
done
echo Done
After changing the script, I then entered this command to run the script for 36 iterations and again for 37 iterations.
./rep-combos.bash && ./rep-combos.bash 7
I then entered some verification commands against the files that running the script created.
for file in $(ls out.md5sum.*); do \
wc -l $file
done
729 out.md5sum.6
2187 out.md5sum.7
for file in $(ls out.md5sum.*); do \
sort $file | uniq -d | wc -l
done
0
729
Output from the second for
loop command shows that there are no duplicate lines in the first file, out.md5sum.6
, having 729 lines. All lines in this file are unique. For the case of the next file, out.md5sum.7
, having 2187 lines, there are 729 duplicated lines. Based on the outputs from these commands, I was satisfied that 729 was indeed the number of unique possible replacement combinations for the set of six elements, given a choice of three possible replacement elements.
Trying to return to the original problem
An interesting digression hopefully, but to return to my task at hand, documenting the --x-replicas-on-different
feature for the LINSTOR User Guide, the thought of distilling 729 possible combinations seemed daunting. I wondered whether that was even a valid description of my task. It was at this moment that I also wondered whether some of the 729 possible relationship combinations were logically inconsistent. Not entirely at random, I picked the 23rd one that the script generated:
A=B A=C A>D B<C B<D C<D
Here, A=B
and A=C
implies B=C
, but that contradicts B<C
. Therefore, this relationship combination can be discarded. I wondered how many more of these combinations could be discarded because they are logically inconsistent. I near simultaneously wondered if my original assumption, that there are three discrete relationship operators was too optimistic. I wondered if I should have considered the less-than-or-equal-to and greater-than-or-equal-to operators. The thought of five raised to the sixth power number of combinations made my head spin a bit.
I quickly ran through a slight modification, adding a less-than-or-equal-to operator, to that last relationship set:
A=B A=C A>D B<=C B<D C<D
Again, if I say A=3
, it follows that B=3
and C=3
, this time B<=C
is valid. The relationship A>D
will be valid if D=2
, however B<D
is invalid so not even an allowance for another relationship operator, less-than-or-equal-to, can rescue the relationship combination set from logical inconsistency.
After this, I set aside these thoughts and decided that I needed to move away from the conceptual realm and toward the practical.
Reducing the number of relationship combinations
Here, I thought I might try to reduce the number of combinations by starting with a fixed relationship between two of the variables, to learn if I might use that as an approach to build up to more general observations for documenting the feature. So taking inspiration from Kereczman’s article, I decided that after defining my calculation problem, I might need to limit my initial investigation to consider possible scenarios given fixed values for the number of nodes having value1
and value2
for the applicable auxiliary property. That is, there would be two value1
nodes and three value2
nodes.
In Kereczman’s article, C=2 (portland
nodes) and D=3 (vienna
nodes), that is, C<D. Given this fixed relationship, C<D
, I could quickly learn the number of relationship possibilities that this element is found in by entering this command:
grep 'C<D' out.combos.6| wc -l
The command output returns 243, or an unsurprising reduction from 36 to 35, as later pointed out to me by a developer colleague who helped review this article.
If C<D
, then I know that if, for example, a set included A=C
, then I could discard the set if it also included A=D
or A>D
, because those sets would be logically inconsistent. This can be shown by assuming values of C=2
and D=3
, which would result in the false relationship 2>3
when A=C
, for example.
By entering grep 'C<D' out.combos.6 | grep 'A=C' | wc -l
, I learned that there are 81 (34) sets that contain the A=C
relationship.
I reduced the number of relationship combination sets then by entering the following command:
grep 'C<D' out.combos.6 | grep -vE 'A=C (A=D|A>D)' | wc -l
The command output returns 189, rather than the 243 I started with.
I could do a similar reduction operation for the sets where A=B
and either B=D
or B>D
appear. I combined these reductions operations by entering the following command:
grep 'C<D' out.combos.6 | grep -vE 'A=C (A=D|A>D)' |\
grep -vE 'A=B.*(B=D|B>D)' | wc -l
This command output shows 147.
I suspected there was probably some mathematical way to work out the number of consistent sets. My first reduction operation left me with 189 sets. Taking a next step: 243 – 189 = 54 and 54 / 2 = 27 which is 33. Mathematics!
Creating a program to find logically consistent relationship sets
However, I could not resist the temptation to ask for AI chat engine help, by using the following prompt:
If there are 729 possible ways to replace the x in the general set of relationships between 4 variables, A, B, C, and D, where the relationship set is AxB AxC AxD BxC BxD CxD, and x can be replaced by one of three relationship operators, less than <, equal to =, or greater than >, is there a way to determine the number of logically consistent sets? For example, the set A=B A=C A>D B<C B<D C<D is logically inconsistent because if A=B and A=C, then B=C and so B cannot be less than C. Can you create a Python program that could determine the number of logically consistent and inconsistent relationship sets?
After a few many back-and-forths between bad programs and revised prompts, I was given a Python program that seemed to produce some interesting and relevant results. The program marked logically consistent sets with a +
symbol and inconsistent sets with a -
symbol. I was even able to modify the program to show minimum values for A, B, C, and D, that would satisfy the relationships shown in logically consistent sets. By doing this, I thought that I could later plug these numbers into LINSTOR commands to observe LINSTOR resource placement behavior.
I was amazed, and relieved, if this program was true to my intentions. Here is a sample of the program output:
+ A<B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=2, C=3, D=4
+ A<B, A<C, A<D, B<C, B<D, C=D Min Values: A=1, B=2, C=3, D=3
+ A<B, A<C, A<D, B<C, B<D, C>D Min Values: A=1, B=2, C=4, D=3
- A<B, A<C, A<D, B<C, B=D, C<D
- A<B, A<C, A<D, B<C, B=D, C=D
- A<B, A<C, A<D, B<C, B=D, C>D
[...]
- A>B, A>C, A>D, B>C, B=D, C>D
+ A>B, A>C, A>D, B>C, B>D, C<D Min Values: A=4, B=3, C=1, D=2
+ A>B, A>C, A>D, B>C, B>D, C=D Min Values: A=3, B=2, C=1, D=1
+ A>B, A>C, A>D, B>C, B>D, C>D Min Values: A=4, B=3, C=2, D=1
Total number of logically consistent sets: 49
Total number of logically inconsistent sets: 680
Testing this out by plugging in some numbers for these sets, I seemed to confirm the program’s results for a small sample. Here, I had the suspicion that the number of logically consistent sets, 49, might be larger than it needed to be for documenting the --x-replicas-on-different
command option. My thought was that the inequality relationships between C and D, only need to be represented once, in a unidirectional way, whenever there are two logically consistent sets that otherwise have the same relationships between variables. That is to say, given the logically consistent set A<B, A<C, A<D, B<C, B<D, C<D
, if there is also a logically consistent set, A<B, A<C, A<D, B<C, B<D, C>D
, then for the purpose of documenting the feature, I only need to consider one of these sets.
Before giving a hypothetical example to further support this observation, I had better first determine whether there was such a mirrored set pair, A<B, A<C, A<D, B<C, B<D, C<D
and A<B, A<C, A<D, B<C, B<D, C>D
, by entering the following command:
./eval-relationship-set-consistency-for-3-rels-4-vars.py |\
grep 'A<B, A<C, A<D, B<C, B<D' |\
grep -v 'C=D'
I was relieved to learn that output from the command showed that there was a mirrored set pair.
+ A<B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=2, C=3, D=4
+ A<B, A<C, A<D, B<C, B<D, C>D Min Values: A=1, B=2, C=4, D=3
Because the values of the auxiliary property someone uses with the --x-replicas-on-different
are arbitrary and user-generated, it is not important whether the number of nodes having one value for the auxiliary property is greater than or less than the number of nodes having a different value for the same auxiliary property.
In the moment, I set this thinking aside. However, I later determined that there were eight such mirrored pair sets:
+ A=B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=1, C=2, D=3
+ A=B, A<C, A<D, B<C, B<D, C>D Min Values: A=1, B=1, C=3, D=2
+ A<B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=2, C=3, D=4
+ A<B, A<C, A<D, B<C, B<D, C>D Min Values: A=1, B=2, C=4, D=3
+ A<B, A<C, A<D, B>C, B>D, C<D Min Values: A=1, B=4, C=2, D=3
+ A<B, A<C, A<D, B>C, B>D, C>D Min Values: A=1, B=4, C=3, D=2
+ A>B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=1, C=3, D=4
+ A>B, A<C, A<D, B<C, B<D, C>D Min Values: A=2, B=1, C=4, D=3
+ A=B, A>C, A>D, B>C, B>D, C<D Min Values: A=3, B=3, C=1, D=2
+ A=B, A>C, A>D, B>C, B>D, C>D Min Values: A=3, B=3, C=2, D=1
+ A<B, A>C, A>D, B>C, B>D, C<D Min Values: A=3, B=4, C=1, D=2
+ A<B, A>C, A>D, B>C, B>D, C>D Min Values: A=3, B=4, C=2, D=1
+ A>B, A>C, A>D, B<C, B<D, C<D Min Values: A=4, B=1, C=2, D=3
+ A>B, A>C, A>D, B<C, B<D, C>D Min Values: A=4, B=1, C=3, D=2
+ A>B, A>C, A>D, B>C, B>D, C<D Min Values: A=4, B=3, C=1, D=2
+ A>B, A>C, A>D, B>C, B>D, C>D Min Values: A=4, B=3, C=2, D=1
Again, for documentation purposes, for each pair of mirrored sets, it is arbitrary which set one chooses to make an observation about LINSTOR resource placement, because C and D represent the number of nodes for a given user-defined auxiliary property.
Testing automatic resource placement by using LINSTOR
To test LINSTOR automatic resource placement behavior, I used a LINSTOR cluster having seven satellite nodes and one controller node. I needed such a large cluster to test some of the consistent set cases where C+D=7, such as the set shown earlier:
A<B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=2, C=3, D=4
The output of a linstor node list --show-aux-props
command for the 8-node environment for this example case is as follows:
╭─────────────────────────────────────────────────────────────────╮
┊ Node ┊ NodeType ┊ Addresses ┊ AuxProps ┊ State ┊
╞═════════════════════════════════════════════════════════════════╡
┊ linstor-ctrl-0 ┊ CONTROLLER ┊ [...] ┊ ┊ Online ┊
┊ linstor-sat-0 ┊ SATELLITE ┊ [...] ┊ Aux/site=dc1 ┊ Online ┊
┊ linstor-sat-1 ┊ SATELLITE ┊ [...] ┊ Aux/site=dc1 ┊ Online ┊
┊ linstor-sat-2 ┊ SATELLITE ┊ [...] ┊ Aux/site=dc1 ┊ Online ┊
┊ linstor-sat-3 ┊ SATELLITE ┊ [...] ┊ Aux/site=dc2 ┊ Online ┊
┊ linstor-sat-4 ┊ SATELLITE ┊ [...] ┊ Aux/site=dc2 ┊ Online ┊
┊ linstor-sat-5 ┊ SATELLITE ┊ [...] ┊ Aux/site=dc2 ┊ Online ┊
┊ linstor-sat-6 ┊ SATELLITE ┊ [...] ┊ Aux/site=dc2 ┊ Online ┊
╰─────────────────────────────────────────────────────────────────╯
Each LINSTOR satellite node had a 4GiB storage pool, named lvm-thin
, backed by a thin-provisioned LVM logical volume. I had some adventures related to the storage pool size that I will discuss later. At this point, I will just say that for purposes of testing LINSTOR automatic resource placement with empty resources, it is critical that the storage pool size is exactly the same on each node. However, when using LINSTOR practically, in real-world cases, where data is written and replicated to resources, this will not be an issue. I will demonstrate this in a later section of this article.
Next, I created a simple LINSTOR resource group, named rg0
, that would place LINSTOR resources created from the resource group in the lvm-thin
storage pool. I did this by entering the following command:
linstor resource-group create rg0 --storage-pool lvm-thin
Examining some details about this resource group by using a linstor resource-group list -r rg0
command showed the following:
╭─────────────────────────────────────────────────────────────────╮
┊ ResourceGroup ┊ SelectFilter ┊ VlmNrs ┊ Description ┊
╞═════════════════════════════════════════════════════════════════╡
┊ rg0 ┊ PlaceCount: 2 ┊ ┊ ┊
┊ ┊ StoragePool(s): lvm-thin ┊ ┊ ┊
╰─────────────────────────────────────────────────────────────────╯
While this output shows a placement count already defined by default as 2
, LINSTOR allows you to override this number for resources that you might create from the resource group, as I will show later.
To determine how LINSTOR would place resources for each of the 49 logically consistent sets, I created a Bash script. This Bash script takes as input a file named consistent-sets-sorted
, created from the output of:
./eval-relationship-set-consistency-for-3-rels-4-vars.py |\
grep '^+' |\
sort > consistent-sets-sorted
This command uses the Python program mentioned earlier to list all the possible relationships sets, filters the output to just show logically consistent sets, then sorts and saves the output to a file. A sample of file contents is as follows:
+ A=B, A=C, A=D, B=C, B=D, C=D Min Values: A=1, B=1, C=1, D=1
+ A=B, A=C, A<D, B=C, B<D, C<D Min Values: A=1, B=1, C=1, D=2
+ A=B, A<C, A<D, B<C, B<D, C=D Min Values: A=1, B=1, C=2, D=2
+ A=B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=1, C=2, D=3
[...]
+ A>B, A>C, A>D, B>C, B<D, C<D Min Values: A=4, B=2, C=1, D=3
+ A>B, A>C, A>D, B<C, B>D, C>D Min Values: A=4, B=2, C=3, D=1
+ A>B, A>C, A>D, B>C, B>D, C<D Min Values: A=4, B=3, C=1, D=2
+ A>B, A>C, A>D, B>C, B>D, C>D Min Values: A=4, B=3, C=2, D=1
An astute reader might notice that a placement count value of 1, for the variable B, likely does not make much sense for high availability (HA) or disaster recovery (DR) use cases. For example, the minimum values for the first consistent set in this filtered output are: A=1, B=1, C=1, and D=1. It turned out that these values work, when plugged into LINSTOR commands. However, to make the LINSTOR resource creation commands closer to practical, I edited some of the “minimum values” that the Python program generated, by using the following logic:
- If a set has a variable value equal to 4, leave the set as it is.
- Otherwise, change the highest variable value in the set to 4, with the limitation that C+D<=7 (because the cluster only has seven satellite nodes), and change other variable values accordingly.
Here is a sample of my edited consistent sets file, for the first four sets:
+ A=B, A=C, A=D, B=C, B=D, C=D Min Values: A=3, B=3, C=3, D=3
+ A=B, A=C, A<D, B=C, B<D, C<D Min Values: A=3, B=3, C=3, D=4
+ A=B, A<C, A<D, B<C, B<D, C=D Min Values: A=2, B=2, C=3, D=3
+ A=B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=2, C=3, D=4
[...]
Running the experiment
After running some iterations of my Bash script to place LINSTOR resources in the cluster, given the constraints of each logically consistent set, it turned out that LINSTOR was able to successfully place at least one diskful resource for each set. I collected some interesting observations from the results and will discuss them in the following subsections.
Nodes with an unset auxiliary property
For cases where C+D<7, there will be one or more nodes that do not have the auxiliary property, site
, set on them. These nodes behave as if they had a different auxiliary property value from dc1
or dc2
set on them. During one experiment for the consistent set A<B, A<C, A<D, B<C, B>D, C>D Min Values: A=1, B=3, C=4, D=2
, a linstor node list --show-aux-props --pastable
command showed this:
+-----------------------------------------------------------------+
| Node | NodeType | Addresses | AuxProps | State |
|=================================================================|
| linstor-sat-0 | SATELLITE | [...] | Aux/site=dc1 | Online |
| linstor-sat-1 | SATELLITE | [...] | Aux/site=dc1 | Online |
| linstor-sat-2 | SATELLITE | [...] | Aux/site=dc1 | Online |
| linstor-sat-3 | SATELLITE | [...] | Aux/site=dc1 | Online |
| linstor-sat-4 | SATELLITE | [...] | Aux/site=dc2 | Online |
| linstor-sat-5 | SATELLITE | [...] | Aux/site=dc2 | Online |
| linstor-sat-6 | SATELLITE | [...] | | Online |
+-----------------------------------------------------------------+
The resource placement was as follows:
+-------------------------------------------------------------------------------------+
| ResourceName | Node | Layers | Usage | Conns | State | CreatedOn |
|=====================================================================================|
| myres | linstor-sat-0 | DRBD,STORAGE | Unused | Ok | UpToDate | [...] |
| myres | linstor-sat-4 | DRBD,STORAGE | Unused | Ok | UpToDate | [...] |
| myres | linstor-sat-6 | DRBD,STORAGE | Unused | Ok | UpToDate | [...] |
+-------------------------------------------------------------------------------------+
What would have been an impossible request for LINSTOR to fulfill if all the cluster nodes had the site
auxiliary property set to either dc1
or dc2
was successful, because the linstor-sat-6
node, by not having the auxiliary property set, counted as belonging to a third different site
. Here LINSTOR could satisfy the placement count of 3
and the 1
resource placement on each different site
auxiliary property node group.
This behavior might be unexpected without having run this experiment and it is now documented in the LINSTOR User Guide. It might be the case that because this behavior is unexpected, it might be unwanted, something that the LINSTOR development team now knows about and could change in a future LINSTOR version.
The number of sensible cases is more manageable than originally assumed
Knowing about the LINSTOR placement behavior when there are nodes without the auxiliary property set, it is reasonable to expect that a system administrator would want to set the auxiliary property on all nodes within a given LINSTOR cluster, when using the --x-replicas-on-different
feature to constrain resource placement. In the 7-satellite node cluster I ran the placement experiments on for the 49 possible logically consistent sets, it turns out that there are only 10 consistent sets that are sensible. These sets are the ones for which the auxiliary property site
is set on all seven satellite nodes, that is, the ones where C+D=7. Filtering my edited consistent sets file, the one where I increased some values as discussed earlier, for grep -E '^+.*(C=4, D=3|C=3, D=4)'
, these sets are:
+ A=B, A=C, A<D, B=C, B<D, C<D Min Values: A=3, B=3, C=3, D=4
+ A=B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=2, C=3, D=4
+ A=B, A<C, A<D, B<C, B<D, C>D Min Values: A=2, B=2, C=4, D=3
+ A<B, A<C, A<D, B=C, B<D, C<D Min Values: A=2, B=3, C=3, D=4
+ A<B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=2, C=3, D=4
+ A<B, A<C, A<D, B<C, B<D, C>D Min Values: A=1, B=2, C=4, D=3
+ A<B, A<C, A<D, B=C, B>D, C>D Min Values: A=2, B=4, C=4, D=3
+ A>B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=1, C=3, D=4
+ A>B, A<C, A<D, B<C, B<D, C>D Min Values: A=2, B=1, C=4, D=3
+ A=B, A=C, A>D, B=C, B>D, C>D Min Values: A=4, B=4, C=4, D=3
Are there any mirrored pairs among these sets? Yes, there are three mirrored pair sets:
+ A=B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=2, C=3, D=4
+ A=B, A<C, A<D, B<C, B<D, C>D Min Values: A=2, B=2, C=4, D=3
+ A<B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=2, C=3, D=4
+ A<B, A<C, A<D, B<C, B<D, C>D Min Values: A=1, B=2, C=4, D=3
+ A>B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=1, C=3, D=4
+ A>B, A<C, A<D, B<C, B<D, C>D Min Values: A=2, B=1, C=4, D=3
Removing one set from each mirrored pair reduces the number of sensible cases to seven:
+ A=B, A=C, A<D, B=C, B<D, C<D Min Values: A=3, B=3, C=3, D=4
+ A=B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=2, C=3, D=4
+ A<B, A<C, A<D, B=C, B<D, C<D Min Values: A=2, B=3, C=3, D=4
+ A<B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=2, C=3, D=4
+ A<B, A<C, A<D, B=C, B>D, C>D Min Values: A=2, B=4, C=4, D=3
+ A>B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=1, C=3, D=4
+ A=B, A=C, A>D, B=C, B>D, C>D Min Values: A=4, B=4, C=4, D=3
The upcoming sections show and discuss the resource placement results for each of these seven cases.
📝 NOTE: I took some liberties and combined the output from a
linstor node list --show-aux-props --pastable
and alinstor resource list -resources myres --pastable
command, when showing the resource placement results table for each case.
Resource placement results that hint at a feature usage guideline
# A=B, A=C, A<D, B=C, B<D, C<D Min Values: A=3, B=3, C=3, D=4
+----------------------------------------------------------------------------------------+
| ResourceName | Node | AuxProps | Layers | Usage | Conns | State |
|========================================================================================|
| myres | linstor-sat-0 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-1 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-2 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
+----------------------------------------------------------------------------------------+
Here, LINSTOR fulfills the --place-count 3
constraint by placing three resources on dc1
nodes. If the --x-replicas-on-different
placement value had been less than 3
, LINSTOR would have placed at least one resource on a dc2
node. Because the placement results place all the resources on just one site, using the --x-replicas-on-different
was likely of no use here. A rule for using this feature might be, for meaningful results: Ensure that the number you specify with the --x-replicas-on-different
option is less than or equal to the resource --place-count
value but if it is not, ensure that it is greater than the number of nodes in the “different” group having the least number of nodes. The variable value, A=3, in this set violates this rule because while A is less than or equal to B (3), A is not greater than C (3).
Resource placement results showing an unnecessary use case
A=B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=2, C=3, D=4
+------------------------------------------------------------------------------------------+
| ResourceName | Node | AuxProps | Layers | Usage | Conns | State |
|==========================================================================================|
| myres | linstor-sat-0 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-1 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-2 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | TieBreaker |
+------------------------------------------------------------------------------------------+
For this set, LINSTOR places two diskful resources and because of the LINSTOR automatic DRBD® quorum feature, places a third diskless resource on a dc1
node. The value 2
for the variable A, also violates the rule just described for using the --x-replicas-on-different
feature. Something sensible happens with this placement, that is, local HA and a local tie-breaking diskless resource replica. However, using the --x-replicas-on-different
feature was uneccessary. These resource placement results could have been realized in a more direct and understandable way by using a resource group that constrained to the dc1
nodes, for example, by using a --replicas-on-same site=dc1
LINSTOR command option and specifying a placement count of 2
.
Resource placement results for the most common use case
A<B, A<C, A<D, B=C, B<D, C<D Min Values: A=2, B=3, C=3, D=4
+----------------------------------------------------------------------------------------+
| ResourceName | Node | AuxProps | Layers | Usage | Conns | State |
|========================================================================================|
| myres | linstor-sat-0 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-1 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-3 | Aux/site=dc2 | DRBD,STORAGE | Unused | Ok | UpToDate |
+----------------------------------------------------------------------------------------+
Here, again, is the most common use case for the --x-replicas-on-different
feature. For this set, LINSTOR places three diskful resources, two on a “local” site for HA, and one resource “off-site” for DR.
Resource placement results for off-site DR but not local HA
A<B, A<C, A<D, B<C, B<D, C<D Min Values: A=1, B=2, C=3, D=4
+------------------------------------------------------------------------------------------+
| ResourceName | Node | AuxProps | Layers | Usage | Conns | State |
|==========================================================================================|
| myres | linstor-sat-0 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-1 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | TieBreaker |
| myres | linstor-sat-3 | Aux/site=dc2 | DRBD,STORAGE | Unused | Ok | UpToDate |
+------------------------------------------------------------------------------------------+
The placement shown in this set is interesting because LINSTOR will happily place resources that satisfy both the --x-replicas-on-different
and the --place-count
constraints. Similar to the earlier case shown where the placement count was also 2
, LINSTOR places a third diskless tie-breaker resource for DRBD quorum purposes. However, while this resource placement might satisfy DR requirements, it does not satisfy local HA requirements. Still, this type of resource placement might be useful in cases where local HA is not a requirement but off-site DR is. Another possible use case is when the “different” auxiliary property key and values might represent different racks in a single data center, rather than different physical sites. This case would place a replica of the resource in different local failure domains, that is, in different racks, which might be fine, depending on your requirements for the data.
When I asked about this scenario, my colleague, Matt Kereczman, mentioned that someone can accomplish this same placement (--x-replicas-on-different 1 site --place-count 2
) by using the --replicas-on-different
option and use fewer command arguments. A command option snippet for this might be:
--replicas-on-different site --place-count 2
Resource placement results for local and off-site HA and DR
A<B, A<C, A<D, B=C, B>D, C>D Min Values: A=2, B=4, C=4, D=3
+------------------------------------------------------------------------------------------+
| ResourceName | Node | AuxProps | Layers | Usage | Conns | State |
|==========================================================================================|
| myres | linstor-sat-0 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-1 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-2 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | TieBreaker |
| myres | linstor-sat-4 | Aux/site=dc2 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-5 | Aux/site=dc2 | DRBD,STORAGE | Unused | Ok | UpToDate |
+------------------------------------------------------------------------------------------+
The resource placement results for this set are robust. LINSTOR places two diskful resources on nodes at each site
, and automatically places a tiebreaker resource on a fifth node for DRBD quorum purposes. These placements satisfy an off-site DR requirement and a local HA requirement for both sites. When disaster strikes, the off-site location would already have HA in place, which might be useful in the case where disaster affecting the former local site might be prolonged, to protect against node-level service disruption at the DR site.
Resource placement results for an unproductive use case
A>B, A<C, A<D, B<C, B<D, C<D Min Values: A=2, B=1, C=3, D=4
+------------------------------------------------------------------------------------------+
| ResourceName | Node | AuxProps | Layers | Usage | Conns | State |
|==========================================================================================|
| myres | linstor-sat-0 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
+------------------------------------------------------------------------------------------+
While this set results in valid resource placement, placing a single resource neither satisfies local HA nor off-site DR requirements. This is not a productive use of the --x-replicas-on-different
LINSTOR feature, for this reason.
Resource placement results highlighting an interesting feature implementation
A=B, A=C, A>D, B=C, B>D, C>D Min Values: A=4, B=4, C=4, D=3
+------------------------------------------------------------------------------------------+
| ResourceName | Node | AuxProps | Layers | Usage | Conns | State |
|==========================================================================================|
| myres | linstor-sat-0 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-1 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-2 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-3 | Aux/site=dc1 | DRBD,STORAGE | Unused | Ok | UpToDate |
| myres | linstor-sat-4 | Aux/site=dc2 | DRBD,STORAGE | Unused | Ok | TieBreaker |
+------------------------------------------------------------------------------------------+
This last set is interesting because it highlights an aspect of using the --x-replicas-on-different
feature that might not be immediately apparent. That is, that the positive integer value you provide when using the command argument is considered an “up to this number” amount. LINSTOR will do its best to place that number of resources on nodes with different values for the given auxiliary property, however, it will stop placing diskful resources after satisfying the --place-count
value. Sometimes, as the placement results for this set show, this means that LINSTOR will place an uneven number of diskful resources on nodes in each site
.
The placement results for this set also highlight another behavior related to the --x-replicas-on-different
feature: LINSTOR will place diskful resource replicas in one site
, until it either fulfills the positive integer value that you specify with the option, or else it runs out of nodes. When it runs out of nodes, it will place resource replicas on the next group of nodes that have a different value for the auxiliary property. In this example, LINSTOR fulfilled the --place-count
of 4
by placing four resources on dc1
nodes, while also satisfying the --x-replicas-on-different site 4
constraint. That is, LINSTOR placed four replicas on nodes belonging to one different site, dc1
, and “up to four” replicas on nodes belonging to another site, dc2
. In this case, the “up to” qualifier is doing a lot of work in the statement, because no diskful resources were placed on dc2
nodes. A diskless tiebreaker resource was placed on the linstor-sat-4
(dc2
) node for DRBD quorum purposes.
Running the experiment for multiple resources
These placement experiments were interesting but there was at least one other angle to explore. The experiments all involved creating one LINSTOR resource and observing how LINSTOR automatically places the resource across nodes in the cluster. However, people often use LINSTOR to deploy many storage resources in their clusters. With some suggestions from LINBIT® colleagues on how to set a new experiment up, I was able to create two different Bash scripts to do more experimenting. Each script uses a for
loop to create 10 resources across the 7-satellite node cluster. The difference between the scripts is that one script simply creates the 10 resources, while the other script creates a resource, and then writes data to the resource, before moving on to create the next resource.
One concern is that LINSTOR might automatically create multiple resources and place them on the same nodes, for each iteration. For example, for the “most common use case” shown earlier, A<B, A<C, A<D, B=C, B<D, C<D Min Values: A=2, B=3, C=3, D=4
, would LINSTOR always place resources on satellite nodes 0, 1, and 3? If so, the nodes at site dc1
would run out of storage space before nodes at site dc2
.
It turned out that besides automatically placing resources that conform to HA and DR related constraints, LINSTOR also places resources in a balanced way across nodes.
After running the two scripts, here is the output showing the distribution of resource placements on each node for the A<B, A<C, A<D, B=C, B<D, C<D Min Values: A=2, B=3, C=3, D=4
case.
Resources placed without writing data:
linstor-sat-0 placements: 5
linstor-sat-1 placements: 5
linstor-sat-2 placements: 4
linstor-sat-3 placements: 4
linstor-sat-4 placements: 4
linstor-sat-5 placements: 4
linstor-sat-6 placements: 4
Resources placed with writing data:
linstor-sat-0 placements: 4
linstor-sat-1 placements: 5
linstor-sat-2 placements: 5
linstor-sat-3 placements: 4
linstor-sat-4 placements: 4
linstor-sat-5 placements: 4
linstor-sat-6 placements: 4
A lesson learned about LINSTOR storage pools
As mentioned earlier, to get consistent results from LINSTOR automatic resource placement features for these kinds of tests (placing empty resources), having the same storage pool or pools on every node is critical. I learned this the hard way when exploring the --x-replicas-on-different
LINSTOR feature. I originally started with a 4-satellite node LINSTOR cluster of VMs. As I soon realized that I would need more nodes, I added them, and added a LINSTOR storage pool backed by an LVM thin-provisioned logical volume that I created by using the --size 4G
option with an lvcreate
command.
When I ran the multiple resources placement experiment on this cluster, I got the following results for placements without writing data:
linstor-sat-0 placements: 4
linstor-sat-1 placements: 3
linstor-sat-2 placements: 3
linstor-sat-3 placements: 0
linstor-sat-4 placements: 7
linstor-sat-5 placements: 7
linstor-sat-6 placements: 6
These automatic resource placements were unbalanced, favoring dc2
nodes, and sadly, the linstor-sat-3
node received no resource placements at all. I had a vague clue about what the reason might be based on this truncated output of a linstor storage-pool list -s lvm-thin
command:
╭────────────────────────────────────────────────────────────────────────────────────────────
┊ StoragePool ┊ Node ┊ Driver ┊ PoolName ┊ FreeCapacity ┊ TotalCapacity ┊
╞════════════════════════════════════════════════════════════════════════════════════════════
┊ lvm-thin ┊ linstor-sat-0 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 4.00 GiB ┊ 4.00 GiB ┊
┊ lvm-thin ┊ linstor-sat-1 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 4.00 GiB ┊ 4.00 GiB ┊
┊ lvm-thin ┊ linstor-sat-2 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 4.00 GiB ┊ 4.00 GiB ┊
┊ lvm-thin ┊ linstor-sat-3 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 4.00 GiB ┊ 4.00 GiB ┊
┊ lvm-thin ┊ linstor-sat-4 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 4.00 GiB ┊ 4 GiB ┊
┊ lvm-thin ┊ linstor-sat-5 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 4.00 GiB ┊ 4 GiB ┊
┊ lvm-thin ┊ linstor-sat-6 ┊ LVM_THIN ┊ drbdpool/thinpool ┊ 4.00 GiB ┊ 4 GiB ┊
╰────────────────────────────────────────────────────────────────────────────────────────────
I asked LINSTOR lead developer, Gabor Hernadi, about these unbalanced resource placements. It did not take long for Hernadi to determine the issue. Hernadi asked me to run a linstor --machine-readable storage-pool list -s lvm-thin
command. Output, before placing any resources, confirmed Hernadi’s suspicions that the storage pool sizes were slightly different on some nodes. Here is an excerpt of that output:
[...]
{
"storage_pool_name": "lvm-thin",
"node_name": "linstor-sat-2",
[...]
"free_capacity": 4190208,
"total_capacity": 4190208,
[...]
{
"storage_pool_name": "lvm-thin",
"node_name": "linstor-sat-3",
[...]
"free_capacity": 4190208,
"total_capacity": 4190208,
[...]
{
"storage_pool_name": "lvm-thin",
"node_name": "linstor-sat-4",
[...]
"free_capacity": 4194304,
"total_capacity": 4194304,
[...]
Satellite nodes linstor-sat-0
through linstor-sat-3
showed a free capacity of 4190208 kibibytes, while satellite nodes linstor-sat-4
through linstor-sat-6
showed a slightly greater free capacity of 4194304 kibibytes. Referring back to the output of the linstor storage-pool list
command shown earlier, 4 GiB
means exactly 4 GiB, while 4.00 GiB
means about (or rounded to) 4.00 GiB
.
Going back in time, I determined that the Ansible Playbook play that created the LVM logical volume on the first four LINSTOR satellite nodes in my cluster used a --size 50%VG
option. This resulted in the storage pools on the first four satellite nodes having a smaller total capacity size than the last three nodes. When creating the LVM logical volume that backed the LINSTOR storage pool on the last three nodes, I used a --size 4G
option.
LINSTOR automatically places resources in a balanced way
A simple lesson here is that LINSTOR will favor placing resources on nodes with more available storage capacity. This is a benefit when creating multiple resources, because LINSTOR will automatically place new resources in a balanced way.
Another lesson is that without any data, LINSTOR resources consume very little storage capacity. Even after placing 10 resources without data, in my mismatched storage pool cluster mentioned earlier, LINSTOR disk usage (presumably DRBD metadata for the resources), was not enough to affect LINSTOR resource placement.
Examining LINSTOR controller service log messages during the resource placement showed only a slight increase in the LINSTOR auto-placement selector score for satellite node, linstor-sat-3
, between the first resource placement and the tenth:
Autoplacer.Selector: Score: 0.999023, Storage pool 'lvm-thin' on node 'linstor-sat-3'
[...]
Autoplacer.Selector: Score: 0.999123, Storage pool 'lvm-thin' on node 'linstor-sat-3'
This increase in score was not enough to catch up with other nodes that had higher scores and so received resource placements. Hernadi opined that eventually, if there were enough resource placements, the accumulated DRBD metadata stored on other nodes would decrease their storage capacity enough until eventually, the linstor-sat-3
node would have a high enough auto-placement score to receive a resource placement.
Conclusion
It took a while for me to think about the --x-replicas-on-different
LINSTOR feature, run resource placement experiments, process the results, get some invaluable perspectives from LINBIT colleagues, Matt Kereczman, and LINSTOR lead developer, Gabor Hernadi, and then write about it all in a (hopefully) meaningful and interesting way. Still, I have the suspicion that there might be considerations that I have not thought about yet. If you have taken the time to read this article, I thank you, and I would welcome any feedback you might have, or suggestions for more exploration within the feature. You can find me in the LINBIT Community Forum or through the LINBIT® team’s contact us page. Also, for brevity, I have not included all the programs, scripts, or outputs from my experiments. If you might have interest, reach out and I can provide these.
- The number value tells LINSTOR that you want up to that number of resource replicas placed on different LINSTOR satellite node groups, where groups are defined by different values for the auxiliary property that you specify. In Kereczman’s article, the auxiliary property is
site
and the different values given to this property are eitherportland
orvienna
, depending on the node (and presumably the imagined physical location of the node).↩︎