LINBIT SDS, powered by LINSTOR® and DRBD®, is the SDS solution from LINBIT® for managing Linux block storage in Kubernetes. If you’ve used LINSTOR, you know how many knobs can be turned when configuring it. If you’ve followed along with one of our quick start blogs or a README in one of LINBIT’s GitHub repositories, you’ve probably set up a LINSTOR cluster without much consideration for optimizing performance. Most of our blog posts and quick starts are geared towards introducing the reader to a project or feature, as opposed to throwing the reader into the deep end. This post, however, will cover those topics and get you at least waist deep in the world of storage performance for Kubernetes with LINSTOR.
Standard Deployments and Their Expectations
Before jumping into what you can tune in LINSTOR, I should define what a “standard issue” LINSTOR deployment in Kubernetes could look like. One of the most straightforward ways to deploy LINSTOR into Kubernetes is by simply giving the LINSTOR Operator the name of an empty block device (/dev/vdb
in this example) and letting LINSTOR set it up as a LINSTOR storage pool for you. This is done by defining storagePools
in the LinstorSatelliteConfiguration
:
apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
name: storage-satellites
spec:
storagePools:
- name: lvm-thick
lvmPool:
volumeGroup: drbdpool
source:
hostDevices:
- /dev/vdb
If the above settings were in a file named linstor-satellites.yaml
, then you’d configure LINSTOR’s storage pools in Kubernetes by entering:
$ kubectl apply -f linstor-satellites.yaml
Those settings would result in LINSTOR creating an LVM volume group named drbdpool
on a block device named /dev/vdb
attached to your worker nodes, which would then be added to LINSTOR as a storage pool named lvm-thick
. You could then define a LINSTOR StorageClass
in Kubernetes that references this storage pool with a definition such as:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thick-r2"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "2"
storagePool: "lvm-thick"
reclaimPolicy: Retain
allowVolumeExpansion: true
With these configurations applied, your Kubernetes users will be able to request persistent volumes (PVs) from the linstor-csi-lvm-thick-r2
StorageClass. When they do, each PV provisioned by LINSTOR from this StorageClass will result in an LVM logical volume of the requested size within the drbdpool
volume group. This LVM logical volume will be used as backing storage for a DRBD volume that is replicating to a single peer node in the cluster.
For many users this standard deployment could be satisfactory, and there’s something to be said about keeping things simple. Having a PV replicated between two peers in the cluster ensures availability and resilience, and since LINSTOR provisions block storage replicated by DRBD, the overhead for systems resources is limited and performance should be decent out-of-the-box.
That said, there’s always room for improvement. The following section will cover what I’ve found to offer the best performance when using LINSTOR to provide hyperconverged storage in Kubernetes.
Best Practices for Performance Tuned Deployments
From the lowest layer (hardware) to the top (file system options) your choices will have an effect on performance. The following subsections overview some best practices for optimizing performance in your deployments.
Physical or Cloud Storage Selection
Not much you can tune here but I feel like I have to mention the underlying storage.
When you’re purchasing storage for physical deployments or selecting your storage options for a cloud deployment, you’ll never be able to read or write faster than the underlying physical medium you choose. You should have a good understanding of your application’s requirements in terms of IOPS and throughput and choose the appropriate storage option within your budget. No software setting will bend space-time and make your hardware work faster than it was designed to, so it’s important to know you’re building on a solid foundation.
Cloud storage tiers are easier to move between but the biggest leaps in storage performance usually involve moving to a more expensive cloud instance type. For that reason it’s important to understand what your upgrade path looks like, on both the monetary and operations side.
If you find yourself needing more than your current storage is capable of, it’s certainly not impossible to move volumes between nodes or tiers of storage in LINSTOR once you’ve outgrown things; LINSTOR makes it pretty easy to do so.
Choosing your Storage Pool Provider
Once you have your physical storage attached to your cluster nodes, you’ll need to add it to a LINSTOR storage pool. Which storage pool provider you choose will have an impact on features and performance. Some options only make sense for very specific sets of hardware (like Exos and OpenFlex), so we’ll only be looking at the two hardware agnostic storage providers for LINSTOR: LVM and ZFS.
LVM Compared To LVM Thin
You can set up LVM in LINSTOR as a thick or thin LVM storage pool, meaning the volumes LINSTOR creates will either be initialized to the size requested upon provisioning or the device will grow as it is used, respectively.
Thick LVM will perform better than thin LVM under I/O sensitive workloads because of its pre-allocation of blocks. However, thick LVM performance suffers badly when there is a snapshot of the volume attached to it, so much so that LINSTOR does not support thick LVM snapshots. Thin LVM allocates blocks as they’re needed which involves additional I/O. That additional I/O adds up under an application that makes frequent small writes.
If your deployment requires LINSTOR snapshot capabilities, for example, for disaster recovery purposes, you will have to use an LVM thin-provisioned storage pool. However, you should be aware that this might come with a performance cost.
ZFS Compared To ZFS Thin
ZFS, or more technically zvols created from a ZFS storage pool, can be used to back LINSTOR volumes as well. Under the hood, “thick” and “thin” provisioned zvols really only differ in that the space requested is either reserved for them, or not. This means that you’re really choosing the ability to overprovision your host’s storage when you choose the thin ZFS provider for your storage pool in LINSTOR; performance isn’t a concern here. Furthermore, LINSTOR supports snapshots of volumes provisioned from thin and thick provisioned ZFS backed storage pools.
ZFS support in Linux distributions is not as common as LVM yet. This is something to consider when designing your cluster, but that’s a topic for another blog.
Actual Numbers
Theories aside, I ran a quick test using Fio on some AWS instances with general purpose EBS volumes (gp3) to back each of the storage providers discussed above. EBS’s gp3 volumes deliver a baseline of 3000 IOPS. Each LINSTOR volume tested was replicating synchronously between the same three availability zones in the us-west-2 region. The Fio command and results are listed below:
Thick ZFS | Thin ZFS | Thick LVM | Thin LVM | |
---|---|---|---|---|
IOPS | 2098 | 1984 | 3093 | 1650 |
This was a single simple test to benchmark small writes to a single volume, but it does support our theory. Thick LVM performed the best in this test, much better than its thin counterpart. While thin and thick ZFS performed similarly to one another.
If you are only considering performance and are fine not having features like snapshots and snapshot shipping, you can select thick LVM for your storage pool provider, follow the most standard deployment steps, and call it a day. However, with a little tuning you can have your cake and eat it too.
📝 NOTE: Direct I/O Support was merged into the ZFS codebase on September 14th, 2024. Early testers have reported that writes are three times faster than the previous codebase. This may make the argument that LVM is faster than ZFS invalid once the new version is made more broadly available.
Tuning Settings and Topologies for Storage Performance
There are plenty of knobs to turn on LINSTOR to maximize the performance of your Kubernetes storage while also supporting features like snapshots, cloning, and overprovisioning. The following sections will focus on different areas for tuning using the thin LVM storage provider in LINSTOR since it was the lowest performer in our test.
Physical Storage Topology
DRBD keeps track of dirty blocks in its own metadata, which by default, is stored at the end of the block device used for its backing storage. That means there are times when writes to a DRBD volume will cause multiple writes to the same underlying storage. If your underlying storage has a fixed amount of bandwidth, which it does, DRBD will be using some of what could be used by your application. Alternatively, it’s possible to configure LINSTOR so its DRBD volumes use a separate block device, or “external metadata” in DRBD terminology, in order to give your application dedicated access to your storage’s bandwidth.
To use external metadata in LINSTOR, you’ll need a separate block device attached to each of your hosts for DRBD’s metadata in addition to the block device being used by LINSTOR to provision persistent volumes. Then, when you’re configuring the LINSTOR satellites in your Kubernetes cluster, you’ll tell LINSTOR to set up LVM on this volume, and add it to LINSTOR as a storage pool. In the example deployment below, assume that /dev/nvme2n1
is a larger NVMe that will be our storage pool for provisioning persistent volumes, while /dev/nvme1n1
is a smaller NVMe that will be used as a storage pool for DRBD’s metadata. Following the deployment example at the top of this post, populate the linstor-satellites.yaml
configuration file with the following options:
apiVersion: piraeus.io/v1
kind: LinstorSatelliteConfiguration
metadata:
name: storage-satellites
spec:
storagePools:
- name: ext-meta-pool
lvmThinPool:
volumeGroup: meta-vg
thinPool: metapool
source:
hostDevices:
- /dev/nvme1n1
- name: lvm-thin
lvmThinPool:
volumeGroup: data-vg
thinPool: thinpool
source:
hostDevices:
- /dev/nvme2n1
Then, in your StorageClass
definition for Kubernetes, set the StorageClass parameter, property.linstor.csi.linbit.com/StorPoolNameDrbdMeta
, to the name of the external metadata pool, ext-meta-pool
.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thin-r2"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "2"
storagePool: "lvm-thin"
property.linstor.csi.linbit.com/StorPoolNameDrbdMeta: "ext-meta-pool"
reclaimPolicy: Retain
allowVolumeExpansion: true
When volumes are requested from this StorageClass
, LINSTOR will create the data volume and the DRBD metadata volume in separate storage pools backed by separate physical devices, therefore dedicating your data volume’s performance to your application.
Collocate Persistent Volumes with Pods
By default, there is no guarantee that a pod will be scheduled on a worker node that has a physical replica of the persistent volume it’s using to store its data. That means that if a pod was scheduled on a worker node without local storage configured in LINSTOR, the pod would be reaching over the network to perform I/O operations, which means additional latency. This “diskless attachment” (DRBD specific terminology) is sometimes desired, or even required, but for latency sensitive applications like databases, you’ll want to keep latencies as low as possible.
LINSTOR for Kubernetes is topology aware, so it’s only a matter of setting the correct options to enforce a “local access only” policy on a specific StorageClass.
The following StorageClass
definition will tell LINSTOR to wait for a pod to be scheduled before provisioning the necessary persistent volume, and provision one physical replica on the node the pod was scheduled on. I’ve added the volumeBindingMode: WaitForFirstConsumer
option, and the allowRemoteVolumeAccess: “false”
parameter to the previous example:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thin-r2"
provisioner: linstor.csi.linbit.com
parameters:
allowRemoteVolumeAccess: "false"
autoPlace: "2"
storagePool: "lvm-thin"
property.linstor.csi.linbit.com/StorPoolNameDrbdMeta: "ext-meta-pool"
reclaimPolicy: Retain
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
This will ensure the lowest latency access to the persistent volumes created from this StorageClass.
Tuning DRBD
Lastly, we can tune DRBD settings using parameters on our StorageClasses. All the typical DRBD tunings can happen in the storageClass definitions, and there are many, but we’ll only focus on three.
parameters:
[...]
DrbdOptions/Disk/disk-flushes: "no"
DrbdOptions/Disk/md-flushes: "no"
DrbdOptions/Net/max-buffers: "10000"
If your physical storage is attached using battery backed write caches, or if you’re running in the cloud where we can assume this is true, we can disable some of the safety features in DRBD that aren’t needed. Also, configuring max-buffers
to 10k
will allow DRBD more buffer space which has a positive effect on resync times should anything interrupt the replication network or should a host reboot and require a background resync when it returns.
Only the StorageClass
definition from our previous example needs modification, specifically by adding the DrbdOptions
above to the list of parameters, to tune any of DRBD’s settings.
Conclusion
I’ll wrap this blog post up by summarizing the topics covered throughout. Ensuring your physical storage is capable of satisfying your applications’ demands is the bedrock of your Kubernetes clusters’ storage performance. If you can, separate the data storage pool from your metadata storage pool using LINSTOR to maximize the write throughput available to your application. To make sure latency is kept to a minimum, configure LINSTOR’s storage class options to wait for pod scheduling (VolumeBinding: WaitForFirstConsumer
) and disallow remote attachment (allowRemoteVolumeAccess: "false"
). Turning off some of DRBD’s safety nets when it’s safe to do so and giving DRBD some extra buffer space can help with both write performance and resync speeds.
Following these guidelines, or at least knowing these knobs exist, should help you in your quest for achieving the best performance for your LINSTOR persistent storage in Kubernetes. For more information on anything above, see the LINSTOR and DRBD documentation, join the LINBIT Community Forums, or reach out to us directly to schedule a call! LINBIT is here to help.