Change the cluster distribution without downtime

Recently we’ve upgraded one of our virtualization clusters (more RAM), and in the course of this did an upgrade of the virtualization hosts from Ubuntu Lucid to RHEL 6.3 — without any service interruption.

That was not that complicated, really; as our core product DRBD works on (nearly) every Linux distribution, we simply

  1. live-migrated all VMs to one of the nodes;
  2. reinstalled the root filesystem on the other node with RHEL 6.3[1. Copying some things, like the Pacermaker node-UUID, helps a bit, too; but that’s not strictly necessary.] and configured GRUB to boot into that one;
  3. installed matching DRBD modules
  4. waited a few seconds for the resync to complete (which was really that fast, because we didn’t touch the existing logical volumes, and so the changed data were only a few GiB);
  5. and then let Pacemaker take control over the cluster again, allowing us to migrate the VMs to the newly installed node. Without any service interruption.

The key to this was that DRBD and Pacemaker are available in compatible versions on most current distributions — and that’s not a big problem, because we make such packages available for our customers in our repositories.

Upgrading DRBD from 8.3 to 8.4 at the same time is only a small, secondary change; after all, its network code can talk to different versions by design.

