Introduction and considerations
In a single sentence, striping can be described as a technique used in storage systems to spread data across multiple physical disks by dividing it into small chunks, which are written sequentially to different drives.
Striping volumes across several physical devices in a LINSTOR® storage pool improves performance by distributing I/O operations across multiple disks. This reduces bottlenecks and increases throughput when compared to linearly allocated storage pools, which is the default allocation strategy for LVM and therefore LVM backed LINSTOR storage pools.
Another benefit of striping data across multiple devices is avoiding “hot spots” on individual drives within a storage pool. In linearly allocated storage pools, one physical device will be fully allocated before blocks from another device are allocated, therefore devices are unevenly used. This will eventually cause some drives to fail much sooner than others.
Sounds like striping data is all upsides, right? Not completely. When you stripe a single volume across multiple physical disks, you are also expanding the fault domain of services using that volume to include multiple disks. If any of those disks fail, the volume becomes inaccessible and so will your services, or they would if you weren’t also using DRBD® to replicate the volume to a peer. When DRBD detects an I/O error on the storage system below it, it transparently begins passing I/O operations over the network to a peer with a healthy backing device. This means you won’t have any immediate downtime, but you will still need to replace the faulty drive, and perform a full resync of the DRBD volume to recover.
Growing striped volumes once you’ve completely filled the physical devices you started with is also a consideration. Striping requires you to add physical devices in multiples of the stripe count. Conversely, adding a single physical device to extend a linearly allocated logical volume is far simpler and obviously requires less available physical space for additional devices inside of your server chassis.
You will also need to consider your stripe sizes. Which applications will be reading and writing to your striped volumes, and the size of those applications’ I/O operations, should influence the chosen stripe size. An application that frequently performs large sequential I/O operations, such as media streaming or working with virtual machine disk images, would benefit from a larger stripe size (256K or 512K), while smaller random I/O operations, such as what a database or a messaging queue application might perform, would benefit from smaller stripe sizes (32K or 64k).
Importantly, DRBD updates its metadata on disk in 4K chunks. This means that when you’re striping, you will probably benefit from using a separate external physical device and LINSTOR storage pool to store DRBD’s metadata. That is, unless your application’s I/O operation sizes align with DRBD’s.
How to configure volume striping using LINSTOR
Now that you’ve been bombarded with things to think about, you might also want to know how it’s implemented. Assuming you have a LINSTOR cluster up and running, and each LINSTOR satellite node has four additional disks that LINSTOR will use to create striped and replicated storage volumes from, you can use the following example, adjusted for specifics in your cluster, to configure striped storage.
đź“ť NOTE: This blog only discusses configurations compatible with the LVM storage provider for LINSTOR. You can configure the ZFS storage provider similarly, but specifics such as LINSTOR property names and the names of options passed to ZFS will differ.
Adding physical devices to LINSTOR storage pools
Before LINSTOR can manage them, you need to add the physical devices to LINSTOR storage pools. One storage pool, and physical device, will be dedicated to DRBD’s metadata, while the others will be used to create the storage volumes that will store user data.
Create a storage pool named storage
using three physical devices:
linstor physical-storage create-device-pool \
--storage-pool storage --pool-name storage lvm linbit-0 /dev/vdb /dev/vdc /dev/vdd
linstor physical-storage create-device-pool \
--storage-pool storage --pool-name storage lvm linbit-1 /dev/vdb /dev/vdc /dev/vdd
linstor physical-storage create-device-pool \
--storage-pool storage --pool-name storage lvm linbit-2 /dev/vdb /dev/vdc /dev/vdd
Create a storage pool named meta
using the remaining physical device:
linstor physical-storage create-device-pool \
--storage-pool meta --pool-name meta lvm linbit-0 /dev/vde
linstor physical-storage create-device-pool \
--storage-pool meta --pool-name meta lvm linbit-1 /dev/vde
linstor physical-storage create-device-pool \
--storage-pool meta --pool-name meta lvm linbit-2 /dev/vde
đź’ˇ TIP: DRBD’s metadata size requirements are roughly 32MiB for metadata per 1TiB of usable volume size per peer. For that reason, for your external DRBD metadata storage devices, you should use smaller capacity physical devices that have performance characteristics (latency and throughput) equal to or better than the devices that you use for data storage.
Configuring striping on the LINSTOR storage pool
You can now configure the storage pools with striping options. For striping, the relevant lvcreate
options that LINSTOR will need are --stripesize
or -I
, and --stripes
or -i
.
To configure LINSTOR to create the logical volumes (LVs) used as DRBD’s backing storage with three stripes that are 64KiB in size, use the following commands:
linstor storage-pool set-property \
linbit-0 storage StorDriver/LvcreateOptions "-i3 -I64"
linstor storage-pool set-property \
linbit-1 storage StorDriver/LvcreateOptions "-i3 -I64"
linstor storage-pool set-property \
linbit-2 storage StorDriver/LvcreateOptions "-i3 -I64"
âť— IMPORTANT: If you intend to use a separate storage pool for DRBD’s metadata, you must apply the StorDriver/LvcreateOptions
property on the storage pool intended to store the data volumes. Otherwise, it might make more sense to apply the StorDriver/LvcreateOptions
property on the LINSTOR storage pool, allowing for multiple LINSTOR resource groups with different stripe settings. However, if you apply StorDriver/LvcreateOptions
to a storage pool and also configure the storage pool to use a separate LINSTOR storage pool for DRBD metadata, the StorDriver/LvcreateOptions
will be applied to both the data volumes and the metadata volumes when LINSTOR resources are created This is likely suboptimal.
Configuring the metadata pool and the resource group
Finally, to tie everything together, you will need to create a resource group that tells LINSTOR which storage pool to use for data, which storage pool to use for metadata, and the replica
count for LINSTOR resources created from the storage pool.
Create the storage pool specifying the storage pool to use for DRBD’s data volumes, and the number of replicas LINSTOR should create within the cluster when a resource is created from the resource group:
linstor resource-group create --storage-pool storage --place-count 2 rg0
Configure LINSTOR to use the meta
storage pool for DRBD metadata when creating resources from the rg0
resource group:
linstor resource-group set-property rg0 StorPoolNameDrbdMeta meta
Verifying striping of LINSTOR resources
Spawning resources from the rg0
resource group will result in LINSTOR creating striped LVM logical volumes within the storage
LVM volume group that will be used as DRBD’s backing device. LINSTOR will create a single logical volume within the meta
LVM volume group that will be used to persist DRBD’s metadata.
You can apply this relatively complex block device configuration to new LINSTOR resources by using a single command:
linstor resource-group spawn-resources rg0 res0 10G
There are many ways to verify this works as expected. For example, you can use the lsblk
utility to list information about specified block devices. After creating a LINSTOR resource from the rg0
resource group as in the example above, the lsblk
command will show the structure of the underlying block device configuration:
lsblk /dev/vd{b,c,d,e}
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vdb 252:16 0 4G 0 disk
└─storage-res0_00000 253:1 0 10G 0 lvm
└─drbd1000 147:1000 0 10G 0 disk
vdc 252:32 0 4G 0 disk
└─storage-res0_00000 253:1 0 10G 0 lvm
└─drbd1000 147:1000 0 10G 0 disk
vdd 252:48 0 4G 0 disk
└─storage-res0_00000 253:1 0 10G 0 lvm
└─drbd1000 147:1000 0 10G 0 disk
vde 252:64 0 4G 0 disk
└─meta-res0.meta_00000 253:2 0 4M 0 lvm
└─drbd1000 147:1000 0 10G 0 disk
Output should show that the DRBD device LINSTOR created, in this case /dev/drbd1000
, uses four logical volumes: three logical volumes in the storage
volume group, and a single logical volume in the meta
volume group.
đź“ť NOTE: Running an
lsblk
command from aDiskless
node in a LINSTOR cluster will show thedrbd1000
device as a “standalone” device.
Another way that you can verify striping is by monitoring write activity with a utility such as iostat
. This will also show that all four physical devices are being written to while using the LINSTOR-created DRBD device. For example, while running a simulated write workload on the /dev/drbd1000
device created in the example above, iostat
shows that the three storage
logical volumes are receiving writes at an almost exactly equal rate, while the meta
logical volume is being updated as needed by DRBD.
iostat -p vdb,vdc,vdd,vde 1 1
Linux 5.15.0-119-generic (linbit-0) 10/25/2024 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.32 0.00 0.45 0.02 0.01 99.21
Device tps kB_read/s kB_wrtn/s kB_dscd/s kB_read kB_wrtn kB_dscd
vdb 3.73 6.21 14.09 5.78 31239189 70931506 29081604
vdc 3.73 6.18 14.09 5.78 31125966 70924186 29081600
vdd 3.73 6.19 14.09 5.78 31136922 70919466 29081600
vde 1.92 0.05 4.17 0.01 226908 21000917 32768
Conclusion
Striping volumes across multiple devices in a LINSTOR storage pool can provide significant performance benefits, particularly for workloads with heavy I/O demands. By distributing read and write operations across several physical disks, striping helps avoid storage hotspots and reduces latency. However, it’s essential to consider both the complexities and the potential pitfalls that striping can introduce, for example, increasing the fault domain across multiple disks. By using DRBD replication you can mitigate this risk and provide valuable resilience within your clusters. Furthermore, carefully selecting stripe sizes and planning for storage expansion will ensure optimal performance and scalability.
In conclusion, configuring LINSTOR for striped storage requires thoughtful planning but offers a powerful tool to maximize storage efficiency and throughput. Balancing this approach with DRBD replication and an understanding of your specific workloads can unlock significant advantages in a high-performance storage environment.