Kubernetes High Availability for Stateful Workloads

LINBIT is a company with deep roots in Linux High Availability (HA). Because of this, LINBIT has some opinions on what HA is, and how it can be achieved. Kubernetes’ approach to HA generally involves sprawling many replicas of an application across many cluster nodes, therefore making it less impactful when a single node or application instance fails. This approach is great for stateless applications, or applications that can tolerate the performance of shared storage. In contrast, IO-demanding stateful applications often do not “sprawl” well, or sometimes at all. As a result, these applications are “on their own” in terms of achieving high availability. LINSTOR’s High Availability Controller aims to provide high availability to pods in Kubernetes that cannot do so on their own.

StatefulSets, Deployments, and ReplicaSets in Kubernetes will eventually reschedule pods from failed nodes, respecting your defined replica counts. The time and user intervention it takes to do that, however, isn’t what LINBIT typically considers highly available. Pod eviction behavior differs between StatefulSets and Deployments, and between versions of Kubernetes – and honestly it’s sometimes buggy. Kubernetes v1.20.2 is the latest version available, and applying taint tolerations to pods is recommended for controlling pod eviction. However, there are open issues on Kubernetes’ GitHub (since v1.18) which report that the NoExecute taint is not always applying to dead nodes. That bug leaves pods stranded on dead nodes indefinitely. Prior to Kubernetes v1.18, I would set the –pod-eviction-timeout on the kube-controller-manager for more aggressive pod eviction, but that’s no longer supported. My point is, Kubernetes’ approach to HA for singleton workloads isn’t exactly straight forward.

Demonstration of LINSTOR’s HA Controller for Stateful Workloads

LINSTOR’s HA Controller aims to improve pod eviction behavior for workloads backed by LINSTOR volumes. It does this by inspecting the quorum status of the DRBD devices that LINSTOR provisions. If the replication network breaks, the active replica of the volume loses quorum, and LINSTOR’s HA Controller will move the pod to another worker that can access a replica of the volume. Here is a short video (~5min) that shows the LINBIT HA Controller in action:

As I mention in the video, the requirements for using LINSTOR’s High Availability Controller for Kubernetes are that your volume’s DRBD replicas >= 2, your Kubernetes cluster has workers >= 3, and that you’ve labeled your pods with linstor.csi.linbit.com/on-storage-lost: remove. If you meet those requirements, you’re able to confidently move Stateful workloads much sooner.

In the video, I show a 5 node Kubernetes cluster (1 master, 4 workers), with the follow LINSTOR StorageClasses defined:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: "linstor-csi-lvm-thin-r1"
provisioner: linstor.csi.linbit.com
parameters:
  autoPlace: "1"
  storagePool: "lvm-thin"
reclaimPolicy: Delete
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: "linstor-csi-lvm-thin-r2"
provisioner: linstor.csi.linbit.com
parameters:
  autoPlace: "2"
  storagePool: "lvm-thin"
reclaimPolicy: Delete
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: "linstor-csi-lvm-thin-r3"
provisioner: linstor.csi.linbit.com
parameters:
  autoPlace: "3"
  storagePool: "lvm-thin"
reclaimPolicy: Delete

Then, I use the following StatefulSet definition to create a workload backed by the linstor-csi-lvm-thin-r2 StorageClass, with pods labeled for the LINSTOR’s HA Controller:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: webapp
spec:
  selector:
    matchLabels:
      app: web
  serviceName: web-svc
  replicas: 1
  template:
    metadata:
      labels:
        app: web
        linstor.csi.linbit.com/on-storage-lost: remove
    spec:
      containers:
      - name: web
        image: httpd:latest
        ports:
        - containerPort: 80
          hostPort: 2080
          name: http
        volumeMounts:
        - name: www
          mountPath: /usr/local/apache2/htdocs
  volumeClaimTemplates:
  - metadata:
      name: www
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: linstor-csi-lvm-thin-r2
      resources:
        requests:
          storage: 1Gi

I then “failed” the worker node, using sysrq-triggers to halt it, and the StatefulSet managed pod gets safely evicted long before Kubernetes’ pod eviction would have kicked in.

So, please check out the video above, try out the controller in your clusters, and drop any comments/questions you have in social media or in our slack community.

Disclaimer: The described software (LINSTOR, K8S-Operator, HA-Controller) are part of LINBIT SDS for Kubernetes. LINBIT SDS for Kubernetes is a bundle of access to pre-built container images, 24×7 enterprise class by the creators of it. Alternatively the components are also available from their upstream sources in the Piraeus DataStore project.

Share on facebook
Share on twitter
Share on linkedin
Share on reddit
Share on whatsapp
Share on vk
Share on email

Share this post

Matt Kereczman

Matt Kereczman

Matt Kereczman is a Solutions Architect at LINBIT with a long history of Linux System Administration and Linux System Engineering. Matt is a cornerstone in LINBIT's technical team, and plays an important role in making LINBIT and LINBIT's customer's solutions great. Matt was President of the GNU/Linux Club at Northampton Area Community College prior to graduating with Honors from Pennsylvania College of Technology with a BS in Information Security. Open Source Software and Hardware are at the core of most of Matt's hobbies.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.