Implementing Disaster Recovery for Proxmox VE with a Stretched LINSTOR Cluster

In this blog, you will delve into the necessary steps and settings for creating a stretched LINSTOR® cluster to provide disaster recovery (DR) capabilities for Proxmox VE virtual machines (VMs). This setup involves two sites, each with a three-node Proxmox VE cluster, using the same LINSTOR storage cluster to manage DRBD® replicated volumes across both sites. This architecture and process ensures that your VMs have high availability (HA) within sites, and DR between sites. The approach in this blog post aims to minimize the recovery point objective (RPO), a metric often used when evaluating DR strategies for critical systems. By using this approach, you might reduce your RPO from minutes, or hours, to seconds.

📝 NOTE: This blog describes an architecture that has been tested in a lab environment. As far as LINBIT® knows, this has not been tested outside of our lab where deployment conditions can be different. If you’re interested in testing this type of configuration in your lab, reach out to LINBIT and we can provide evaluation licenses for DRBD Proxy. We’d appreciate any and all “real-world” feedback.

If you’re not familiar with how LINSTOR integrates into Proxmox VE, the following resources provide a good introduction to the topic:

Prerequisites

Before you begin, you will need the following:

  • Two physical sites, each with a 3-node Proxmox VE cluster.
  • A LINSTOR controller managing LINSTOR satellites (each Proxmox VE node) across both sites.
  • Network connectivity between both sites with sufficient bandwidth to support DRBD (Distributed Replicated Block Device) replication.
  • LINSTOR and DRBD tools installed on all nodes.
  • DRBD Proxy and DRBD Proxy license installed on all nodes.

Configuring LINSTOR for Disaster Recovery

Step 1: Configuring LINSTOR Satellites

In this setup, one LINSTOR controller manages satellites across both sites. To differentiate nodes in different data centers (DCs), you assign auxiliary properties to each LINSTOR satellite node. You can use auxiliary properties as criteria for LINSTOR automatic placement rules, as you will do later in this guide.

linstor node set-property --aux proxmox-0a dc=a
linstor node set-property --aux proxmox-0b dc=b

Repeat this labeling for all nodes, labeling each node with the key value pair according to the data center the node belongs to.

Step 2: Creating Resource Groups for Each Data Center

Next, create resource groups in LINSTOR for each DC. These groups will be used to ensure that LINSTOR creates three physical replicas of each virtual disk, but only within each site.

linstor resource-group create --storage-pool pve-thinpool --place-count 3 thinpool-dc-a
linstor resource-group create --storage-pool pve-thinpool --place-count 3 thinpool-dc-b

Here, thinpool-dc-a and thinpool-dc-b represent resource groups for Site A and Site B. Adjust the storage pool name (pve-thinpool) to match the storage pool(s) configured in your setup.

Step 3: Configuring Replica Placement Within Each Data Center

To ensure that “automatically-placed” replicas stay within their DC, modify the resource groups to reference the auxiliary properties that you set earlier:

linstor resource-group modify thinpool-dc-a --replicas-on-same dc
linstor resource-group modify thinpool-dc-b --replicas-on-same dc

Step 4: Disabling Auto-Tiebreakers

Auto-tiebreakers in LINSTOR can interfere with the replica placement rules created above. Disable this feature for the resource groups:

linstor resource-group set-property thinpool-dc-a DrbdOptions/auto-add-quorum-tiebreaker false
linstor resource-group set-property thinpool-dc-b DrbdOptions/auto-add-quorum-tiebreaker false

Step 5: Disabling Automatic Resource Balancing

LINSTOR will attempt to maintain the placement count of a resource created from resource group. Since you will manually assign additional replicas to DR nodes in the cluster, you should disable the BalanceResourcesEnabled feature on the LINSTOR cluster:

linstor controller set-property BalanceResourcesEnabled no

Step 6: Configuring DRBD Proxy and Protocols

For each node, assign the Site property according to which site each node belongs. This will allow you to enable DRBD Proxy for resource connections when resource replicas are assigned in different sites:

linstor node set-property proxmox-0a Site a
linstor node set-property proxmox-0b Site b

Apply these settings to all nodes. Then, enable DRBD Proxy auto-configuration:

linstor controller set-property DrbdProxy/AutoEnable true

Step 7: Configuring DRBD Options for Disaster Recovery

For any resource that will be configured for DR, ensure that allow-two-primaries is disabled. This is required by DRBD when using asynchronous replication, and asynchronous replication is required when replicating over long distances with DRBD Proxy.

linstor resource-definition drbd-options --allow-two-primaries no <resource>

💡 TIP: The name of the LINSTOR resource for a specific VM’s virtual disk can be found by inspecting the “Hardware” configuration for the VM in the Proxmox VE UI. The resource name will begin with pm- and be followed by a unique string of eight alpha-numeric characters.

❗ IMPORTANT: This setting must be enabled for live migrating VMs between HA peers. Therefore, enabling DR between sites while maintaining live migration within a site, is currently not possible.

Step 8: Manually Assigning Replicas Across Sites

Finally, to create a DR replica of a virtual machine disk image, manually assign a replica to a node in the DR site:

linstor resource create <dr-node> <resource-name> --storage-pool pve-thinpool

This will start a background resynchronization of the virtual machine disk image associated with the LINSTOR resource. Once fully synchronized, manually moving the virtual machine to the other site will be possible.

💡 TIP: You can check the progress of synchronization by using linstor resource list or drbdadm status.

Configuring Proxmox VE for DR

There are Proxmox VE specific configurations and steps required to use a stretched LINSTOR cluster for DR purposes.

Step 1: Configuring Proxmox Cluster Storage for Each Site

Each Proxmox cluster should be configured to use their corresponding LINSTOR resource group (e.g., thinpool-dc-a for Site A, and thinpool-dc-b for Site B). Therefore, each cluster will have slightly different /etc/pve/storage.cfg configurations, ensuring automatic placement of virtual disk replicas are created on the correct storage nodes.

Site A configuration:

drbd: drbdstorage
    content images,rootdir
    controller 192.168.222.130
    resourcegroup thinpool-dc-a

Site B configuration:

drbd: drbdstorage
    content images,rootdir
    controller 192.168.222.130
    resourcegroup thinpool-dc-b

📝 NOTE: The controller IP address which belongs to the active LINSTOR controller in the stretched cluster must be accessible from each site. For HA and DR purposes, the LINSTOR controller should store its database (/var/lib/linstor/) on a DRBD device which is replicated between sites. If an IP address cannot easily be moved between sites, the controller IP address will need to be updated in the storage configuration after a LINSTOR controller migration.

Step 2: Configuring Unique VM ID Ranges

To prevent VM ID conflicts between sites, assign unique ranges for starting and stopping VM IDs within Proxmox’s configuration options in each DC. For example, use IDs 100-199 in Site A and 200-299 in Site B.

Step 3: Syncing VM Configuration Files to DR Site

For the DR Proxmox VE cluster to “discover” a VM from another site, you must synchronize VM configuration files from the Proxmox VE hypervisor currently running the VM to a node at the DR site by using rsync:

rsync -avz /etc/pve/qemu-server/*.conf <destination-server>:/etc/pve/qemu-server/

💡 TIP: Cluster specific modifications will likely need to be made to the VM configuration files, therefore continuous and bidirectional synchronization methods for these files should not be used.

Step 4: Adjust Network and Storage Configurations

If network bridge names differ between the two clusters, update the bridge=<nic> setting in the VM configurations accordingly. The following is an example of what this line will look like:

net0: virtio=BC:24:11:78:7C:7C,bridge=vmbr0,firewall=1

Similarly, if LINSTOR’s storage plugin was named differently during configuration, update the scsi* devices accordingly. The following is an example of what this line will look like:

scsi0: drbdstorage:pm-fadde855_100,iothread=1,size=1G

💡 TIP: There might be other differences between clusters in your environment that require you to modify configurations. After you identify any differences, you could write a simple shell script to replace the differing strings in the migrated VM configurations consistently.

Migrating a VM Between Sites

By following the processes and guidelines outlined above, you should now be able to move a VM between sites when needed. You can take the following steps to perform a migration:

Conclusion

By following these steps, you can configure a stretched LINSTOR cluster across two sites, providing robust DR capabilities for your HA Proxmox VE environment. This setup ensures that your virtual machines remain available and recoverable, even in case of a site failure.

While this architecture has been tested in a lab environment, real-world implementations can require additional considerations not yet mentioned in this blog. As with any complex IT infrastructure, thorough testing and validation in your own environment is crucial, and LINBIT is eager to help. If you have a lab environment of your own and want to help test the solution described above, reach out to LINBIT for an evaluation license for DRBD Proxy.

For help outside of LINBIT’s formal evaluation support, head over to the LINBIT forums where LINSTOR’s Proxmox VE integration has become a hot topic of discussion.

Matt Kereczman

Matt Kereczman

Matt Kereczman is a Solutions Architect at LINBIT with a long history of Linux System Administration and Linux System Engineering. Matt is a cornerstone in LINBIT's technical team, and plays an important role in making LINBIT and LINBIT's customer's solutions great. Matt was President of the GNU/Linux Club at Northampton Area Community College prior to graduating with Honors from Pennsylvania College of Technology with a BS in Information Security. Open Source Software and Hardware are at the core of most of Matt's hobbies.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.

Talk to us

LINBIT is committed to protecting and respecting your privacy, and we’ll only use your personal information to administer your account and to provide the products and services you requested from us. From time to time, we would like to contact you about our products and services, as well as other content that may be of interest to you. If you consent to us contacting you for this purpose, please tick above to say how you would like us to contact you.

You can unsubscribe from these communications at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.

By clicking submit below, you consent to allow LINBIT to store and process the personal information submitted above to provide you the content requested.