LINBIT® is a company with deep roots in Linux High Availability (HA). Because of this, LINBIT has some opinions on what HA is, and how it can be achieved.
Kubernetes’ approach to HA generally involves sprawling many replicas of an application across many cluster nodes, therefore making it less impactful when a single node or application instance fails. This approach is great for stateless applications, or applications that can tolerate the performance of shared storage, like front-end webapps or APIs.
In contrast, I/O demanding stateful applications and monolithic applications like certain databases or ERP systems often do not “sprawl” well, or at all. As a result, these applications are “on their own” in terms of achieving high availability in Kubernetes.
LINSTOR®’s High Availability Controller aims to provide high availability to pods in Kubernetes that cannot achieve this on their own.
StatefulSets, Deployments, and ReplicaSets in Kubernetes will eventually reschedule their pods, respectful to their defined replica counts, from failed nodes. The time and user intervention it takes to do that, however, is not what LINBIT typically considers highly available behavior. The pod eviction timeout in recent Kubernetes versions is 5 minutes, meaning a single outage would drop your application’s uptime beneath the “5 nines” threshold for the year.
Prior to Kubernetes v1.18, I would set the --pod-eviction-timeout
on the kube-controller-manager
for more aggressive pod eviction, but that was “forever ago”, and is no longer supported. Also, StatefulSets are “stickier” than Deployments or ReplicaSets, and require additional attention from their storage provider before they can be rescheduled.
LINSTOR’s HA Controller aims to improve pod eviction behavior for workloads backed by LINSTOR volumes. It does this by inspecting the quorum status of the DRBD® devices that LINSTOR provisions. If the replication network breaks, the active replica of the volume loses quorum, and LINSTOR’s HA Controller will move the StatefulSet’s pod to another worker that can access a replica of the volume.
Deployment of LINSTOR’s HA Controller for Stateful Workloads
Detailed steps for deployment of LINSTOR’s HA Controller can be found in the LINSTOR User’s Guide. The quick version of those steps is as follows:
- Deploy the HA Controller using Helm:
helm install linstor-ha-controller linstor/linstor-ha-controller
- Add the following parameters to your LINSTOR StorageClasses:
parameters: property.linstor.csi.linbit.com/DrbdOptions/auto-quorum: suspend-io property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-data-accessible: suspend-io property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated: force-secondary property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict: retry-connect
If you need to omit some pod from LINSTOR’s HA Controller for any reason, marking it with the following annotation will cause the controller to ignore the marked pod: kubectl annotate pod <podname> drbd.linbit.com/ignore-fail-over=""
Example Using LINSTOR’s HA Controller
For example, here is a two replica LINSTOR StorageClass with the settings needed for LINSTOR’s HA Controller:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: "linstor-csi-lvm-thin-r2"
provisioner: linstor.csi.linbit.com
parameters:
autoPlace: "2"
storagePool: "lvm-thin"
property.linstor.csi.linbit.com/DrbdOptions/auto-quorum: suspend-io
property.linstor.csi.linbit.com/DrbdOptions/Resource/on-no-data-accessible: suspend-io
property.linstor.csi.linbit.com/DrbdOptions/Resource/on-suspended-primary-outdated: force-secondary
property.linstor.csi.linbit.com/DrbdOptions/Net/rr-conflict: retry-connect
reclaimPolicy: Delete
Using the following StatefulSet definition, create a workload backed by the linstor-csi-lvm-thin-r2 StorageClass:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: webapp
spec:
selector:
matchLabels:
app: web
serviceName: web-svc
replicas: 1
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: httpd:latest
ports:
- containerPort: 80
hostPort: 2080
name: http
volumeMounts:
- name: www
mountPath: /usr/local/apache2/htdocs
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: linstor-csi-lvm-thin-r2
resources:
requests:
storage: 1Gi
You can then “fail” the worker node running the pod, using echo b > sysrq-triggers
to immediately reset it, and, thanks to the LINSTOR HA Controller, the StatefulSet managed pod should be migrated long before Kubernetes’ pod eviction would have kicked in.
Concluding Thoughts
LINSTOR’s HA Controller for Kubernetes can help lower the recovery time of Stateful workloads in Kubernetes, therefore increasing availability and helping to maintain SLAs.
The described software (LINSTOR, LINSTOR Operator for Kubernetes, and LINSTOR HA Controller) are all components of LINBIT SDS for Kubernetes. LINBIT SDS for Kubernetes is a bundle of access to prebuilt container images and 24×7 enterprise class support from the creators and maintainers of LINSTOR. The components are also freely available from the open source CNCF-sandboxed Piraeus project.