Reader Bootstrapping
Amazon Web Services (AWS) is designed from the ground up with availability in mind. The geographical locations of their Availability Zones (AZs) are carefully selected to mitigate the risk of natural disasters that could impact their services. Even the most unexpected natural disaster is unlikely to affect all of a region’s AZs.
However, you can’t simply deploy an application on an instance in Amazon’s EC2, or any other public cloud offering, and consider it Highly Available (HA). To get their users further in their quest for HA, cloud providers support load balancers, floating IPs, DNS failover, database services, and many other products that provide HA features for various common workloads and applications. However, there are always niche or legacy applications that might not be as well supported by the products offered by cloud services. For these types of applications, you’ll need to look for an application-agnostic approach to HA, and storage can often be the hardest part to sort out. This is where DRBD’s block-level replication can help to fill the gap.
LINBIT’s DRBD is an open source block-level replication driver for the Linux (and Windows) kernel created and maintained by LINBIT. This means its replication happens outside of the application and beneath the filesystem at the block-level. This makes DRBD a very handy tool for enabling multi-AZ and even multi-region replication for High Availability (HA) and Disaster Recovery (DR) capabilities on an application’s storage that might not have a native or robust solution available.
DRBD’s low-level implementation as a kernel driver means DRBD typically performs better in terms of write latencies and throughput than higher-level shared filesystem or object storage, which must require some form of lock coordination to prevent data corruption. However, operating as a block device means that DRBD can only be ”Primary” (accessible) on a single instance at a time – just like an EBS or iSCSI volume can only be attached to a single instance at a time. To counter this drawback, DRBD 9 has quorum and auto-promote functions that make enabling access and even node-based fencing easy.
DRBD can replicate writes in synchronous and asynchronous modes between instances comprising the DRBD cluster. Each of the instances in the cluster attaches its own Elastic Block Storage (EBS) device to back the DRBD device, which results in a “shared-nothing” storage cluster between instances. The low-latency, high-throughput networks between Amazon’s Availability Zones (AZs) in any given region are suitable for synchronous replication without having a huge impact on performance. Synchronously replicating between AZs using DRBD results in EBS volumes that are always identical, so if there is ever an outage in the “primary” AZ, the application can be started in a different AZ with access to the same exact data.
This blog post will walk you through standing up a generic three instance DRBD cluster in Amazon’s AWS using modern and well-known tools that most Cloud Engineers and DevOps team members should be familiar with. For the uninitiated, don’t fear, these tools are straightforward to install and start using, and this blog post could even be used as your introduction to them.
Prerequisites
The software and accounts used in this blog:
- AWS account is obviously needed
- AWS Command Line Utility configured with the AWS account
- Terraform will be used to create the AWS Infrastructure from code
- Ansible will be used to configure the AWS Infrastructure from code
TLDR: If you already have all these and are ready to rock-and-roll, these 6 commands are all you need:
$ git clone github.com/kermat/aws-multiaz-drbd-example
$ cd aws-multiaz-drbd-example
$ terraform init
$ terraform apply
$ for n in $(grep -o 172\.16\.10[0-9]*\.[0-9]* hosts.ini); do ansible -a uptime $n; done
$ ansible-playbook drbd.yaml
Otherwise, a few quick notes on the tools and accounts mentioned above before we dive in.
AWS IAM Account
It’s always a good practice to use IAM accounts with limited access to AWS instead of your root AWS account. For this blog post, you could get away with adding an IAM user with “access key only” access (no passwords). The limited IAM account should be added to a new (or existing) group that has only the AmazonEC2FullAccess role applied. You must also add an SSH key pair that you will use to access and configure the infrastructure created. Create your keypair in the us-west-2 region and name it “aws-keypair” (or be prepared to change that in my example code that follows).
AWS Command Line
After you install the AWS command line utility for your operating system, you should be able to run the `aws configure` sub command from a terminal to set up your profile using the access key and secret access key from your AWS IAM account. You will be asked to set the default region for your account. Use us-west-2 since the SSH key-pair generated earlier is specific to this region, and the Ubuntu Amazon Machine Image (AMI) used in my example code is also specific to that region (or be prepared to set the AMI accordingly).
Terraform
Terraform is an open-source infrastructure as code (IaC) tool. It allows users to create their cloud infrastructure through plugins known as providers. Users define their infrastructure in declarative configuration files that specify the infrastructure’s final state.
Terraform’s AWS provider will use your default AWS CLI profile by default. It’s possible to use multiple accounts for the AWS CLI and therefore Terraform, but for this blog post, we’ll assume you’ve set up the IAM created above as your default.
Ansible
Ansible is one of the most popular configuration management tools available today. Ansible configurations are referred to as playbooks, and are written in YAML. The machine doing the configuration is referred to as the “control node”. In this blog, that is your workstation, and the nodes or “groups” of nodes that Ansible will manage are listed in an inventory file. The inventory in file in this blog post will be written for us by Terraform. Ansible needs only SSH access to the nodes it will manage, making it an “agentless” configuration tool that doesn’t need any special tools installed during the bootstrapping of our AWS instances.
Review and Deploy the IaC
Now, we can dive into deployment and configuration details of our DRBD cluster. I’ve put all the Terraform and Ansible code in a Github repository that you can clone locally.
$ git clone github.com/kermat/aws-multiaz-drbd-example $ cd aws-multiaz-drbd-example
You should now have a directory full of configurations that look similar to this:
$ tree . . ├── ansible.cfg # Ansible options to override any global options in /etc/ansible/ ├── hosts.tpl # template for Terraform to build Ansible’s inventory ├── init.sh # placeholder bootstrapping script ├── main.tf # Terraform IaC configuration here ├── outputs.tf # Terraform outputs configured here (IPs) ├── drbd.yaml # Ansible playbook to configure the infrastructure ├── README.md # README ├── templates # default directory name for Ansible config tempaltes │ └── r0.j2 # jinja2 template for our DRBD configuration └── variables.tf # Terraform variables for IaC configuration
I won’t drill down into all of these files, but I will explain the most important ones below.
Terraform variables.tf
This could be the only file that needs editing before you can deploy the example infrastructure.
Specifically, the `keyname` variable is specific to how you’ve set up your AWS account. The key needs to exist in the region you’re going to deploy into (`us-west-2` being the default region). If you don’t have a key in the us-west-2 region, creating one named `aws-keypair` will allow you to deploy without making any changes to any variables.
If you change the region you want to deploy into, you’ll also need to update the AMIs for the Ubuntu images used by the bastion host and cluster nodes. AMI IDs are specific to each region.
The EC2 instance type can be changed for both the bastion host and the DRBD nodes; however, if you’re just poking around, keeping the default `t2.micro` will be very inexpensive (if not free via AWS’s free-tier pricing). For serious performance testing, you’d want to use something more substantial.
I opted to create a VPC and subnets rather than deploy into the default. This makes the IaC much longer, and a little more complex, but it isolates this example deployment from anything else you might be running in the default VPC (and serves as a good example). Unless it overlaps with something you’re already doing, leave the subnet variables as is. If you update the private subnets, be sure to also edit the `drbd_replication_network` subnet filter in the Ansible inventory template `hosts.tpl`.
Terraform main.tf
Most of the magic exists in the main.tf, so I’ll explain that here.
Terraform configurations are declarative, meaning you state what the final product should look like, and Terraform will handle the rest. While the order in which you list objects in your IaC configuration doesn’t matter, it’s a common practice to start with the following sections that tell Terraform the minimum version of Terraform that’s needed to use the IaC configuration, and what “provider” Terraform will need to install and use to configure the objects defined in the code.
terraform { required_version = ">= 0.12" } provider "aws" { region = var.region }
Terraform, via the provider plugins, will do its best job to handle the dependencies between different objects. For example, you could define a subnet before the VPC that it will exist in, and Terraform will know to create the VPC and then the subnet within the VPC. That doesn’t mean you shouldn’t do future you, or your teammates, a favor by using some organization of objects within your IaC. I configured the VPC specific objects needed for networking next.
# VPC and Networking resource "aws_vpc" "drbd-cluster" { cidr_block = var.vpc_cidr tags = { Name = "drbd-cluster" } } resource "aws_subnet" "pub_sub1" { vpc_id = aws_vpc.drbd-cluster.id cidr_block = var.pub_sub1_cidr_block availability_zone = "${var.region}a" map_public_ip_on_launch = true tags = { Name = "drbd-cluster" } } resource "aws_subnet" "pub_sub2" { vpc_id = aws_vpc.drbd-cluster.id cidr_block = var.pub_sub2_cidr_block availability_zone = "${var.region}b" map_public_ip_on_launch = true tags = { Name = "drbd-cluster" } } resource "aws_subnet" "pub_sub3" { vpc_id = aws_vpc.drbd-cluster.id cidr_block = var.pub_sub3_cidr_block availability_zone = "${var.region}c" map_public_ip_on_launch = true tags = { Name = "drbd-cluster" } } resource "aws_subnet" "pri_sub1" { vpc_id = aws_vpc.drbd-cluster.id cidr_block = var.pri_sub1_cidr_block availability_zone = "${var.region}a" map_public_ip_on_launch = false tags = { Name = "drbd-cluster" } } resource "aws_subnet" "pri_sub2" { vpc_id = aws_vpc.drbd-cluster.id cidr_block = var.pri_sub2_cidr_block availability_zone = "${var.region}b" map_public_ip_on_launch = false tags = { Name = "drbd-cluster" } } resource "aws_subnet" "pri_sub3" { vpc_id = aws_vpc.drbd-cluster.id cidr_block = var.pri_sub3_cidr_block availability_zone = "${var.region}c" map_public_ip_on_launch = false tags = { Name = "drbd-cluster" } } resource "aws_route_table" "pub_sub1_rt" { vpc_id = aws_vpc.drbd-cluster.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.igw.id } tags = { Name = "drbd-cluster" } } resource "aws_route_table" "pub_sub2_rt" { vpc_id = aws_vpc.drbd-cluster.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.igw.id } tags = { Name = "drbd-cluster" } } resource "aws_route_table" "pub_sub3_rt" { vpc_id = aws_vpc.drbd-cluster.id route { cidr_block = "0.0.0.0/0" gateway_id = aws_internet_gateway.igw.id } tags = { Name = "drbd-cluster" } } resource "aws_route_table_association" "internet_for_pub_sub1" { route_table_id = aws_route_table.pub_sub1_rt.id subnet_id = aws_subnet.pub_sub1.id } resource "aws_route_table_association" "internet_for_pub_sub2" { route_table_id = aws_route_table.pub_sub2_rt.id subnet_id = aws_subnet.pub_sub2.id } resource "aws_route_table_association" "internet_for_pub_sub3" { route_table_id = aws_route_table.pub_sub3_rt.id subnet_id = aws_subnet.pub_sub3.id } resource "aws_internet_gateway" "igw" { vpc_id = aws_vpc.drbd-cluster.id tags = { Name = "drbd-cluster" } } resource "aws_eip" "eip_natgw3" { count = "1" } resource "aws_nat_gateway" "natgw3" { count = "1" allocation_id = aws_eip.eip_natgw3[count.index].id subnet_id = aws_subnet.pub_sub3.id } resource "aws_route_table" "pri_sub1_rt" { count = "1" vpc_id = aws_vpc.drbd-cluster.id route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.natgw1[count.index].id } tags = { Name = "drbd-cluster" } } resource "aws_route_table_association" "pri_sub1_to_natgw1" { count = "1" route_table_id = aws_route_table.pri_sub1_rt[count.index].id subnet_id = aws_subnet.pri_sub1.id } resource "aws_route_table" "pri_sub2_rt" { count = "1" vpc_id = aws_vpc.drbd-cluster.id route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.natgw2[count.index].id } tags = { Name = "drbd-cluster" } } resource "aws_route_table_association" "pri_sub2_to_natgw2" { count = "1" route_table_id = aws_route_table.pri_sub2_rt[count.index].id subnet_id = aws_subnet.pri_sub2.id } resource "aws_route_table" "pri_sub3_rt" { count = "1" vpc_id = aws_vpc.drbd-cluster.id route { cidr_block = "0.0.0.0/0" nat_gateway_id = aws_nat_gateway.natgw3[count.index].id } tags = { Name = "drbd-cluster" } } resource "aws_route_table_association" "pri_sub3_to_natgw3" { count = "1" route_table_id = aws_route_table.pri_sub3_rt[count.index].id subnet_id = aws_subnet.pri_sub3.id }
That looks like a lot, but really it’s just a VPC, some public and private network for each availability zone, NAT gateway for each private subnet, an internet gateway, and some routing tables and associations between the objects to create the network’s plumbing for our non-default VPC.
Next, I set up the DRBD nodes. I opted to use an autoscaling group (ASG) in AWS to control the DRBD nodes. This ensures they’ll be created using the same launch configuration, which is also created in the IaC, but ASGs will also spread your instances evenly across AZs within a region by default. Additionally, with the correct processes enabled, the autoscaling group can replace unhealthy instances for you. I’ve disabled those processes in this example, and am only using the ASG to set the number of nodes and their even distribution across AZs.
Important settings to note within the DRBD specific objects below:
- In the launch configuration:
- There is an additional block device besides the root device. This is the device that will be back DRBD, and will be replicated between the nodes in each AZ.
- There is a dummy (for now) user_data script that is run on the first launch of each instance. This is often used to install additional packages or make configurations not included in the base image for an instance. We’re using Ansible instead.
- In the ASG configuration:
- The list of suspended_processes could be deleted in order to allow the ASG to automatically replace unhealthy nodes.
- The min and max size control the DRBD node instance count.
- In the security group configuration:
- We allow only SSH from the bastion hosts security group.
- We open port 7000 for DRBD replication between instances within the DRBD security group.
# DRBD Instances resource "aws_launch_configuration" "drbd-cluster" { image_id = var.ami instance_type = var.ec2_type key_name = var.keyname security_groups = [aws_security_group.drbd-instance.id] user_data = "${file("init.sh")}" root_block_device { volume_type = "gp2" volume_size = 20 encrypted = true } ebs_block_device { device_name = "/dev/sdf" volume_type = "gp3" volume_size = 100 encrypted = true } lifecycle { create_before_destroy = true } } resource "aws_autoscaling_group" "drbd-cluster" { launch_configuration = aws_launch_configuration.drbd-cluster.name vpc_zone_identifier = ["${aws_subnet.pri_sub1.id}","${aws_subnet.pri_sub2.id}","${aws_subnet.pri_sub3.id}"] health_check_type = "EC2" suspended_processes = [ "Launch", "Terminate", "AZRebalance", "HealthCheck", "ReplaceUnhealthy", "AddToLoadBalancer", "AlarmNotification", "InstanceRefresh", "ScheduledActions", "RemoveFromLoadBalancerLowPriority" ] min_size = 3 max_size = 3 tag { key = "Name" value = "drbd-cluster-asg" propagate_at_launch = true } } resource "aws_security_group" "drbd-instance" { name = "drbd-cluster-node-sg" vpc_id = aws_vpc.drbd-cluster.id ingress { from_port = 22 to_port = 22 protocol = "tcp" security_groups = [aws_security_group.bastion.id] } ingress { from_port = 7000 to_port = 7000 protocol = "tcp" self = true } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] ipv6_cidr_blocks = ["::/0"] } tags = { Name = "drbd-cluster" } }
The bastion host is just a simple EC2 instance manually placed into the “pub_sub3” subnet. It has its own security group that accepts SSH from anywhere but this could be restricted to a specific range of IPs for better security.
# Bastion Host resource "aws_security_group" "bastion" { name = "drbd-cluster-bastion-sg" vpc_id = aws_vpc.drbd-cluster.id ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] ipv6_cidr_blocks = ["::/0"] } tags = { Name = "drbd-cluster" } } resource "aws_instance" "bastion" { ami = var.bastion_ami instance_type = var.bastion_ec2_type key_name = var.keyname subnet_id = aws_subnet.pub_sub3.id vpc_security_group_ids = [aws_security_group.bastion.id] tags = { Name = "bastion" } }
Finally, there is a data source used to set the IP addresses from the DRBD instances’ ASG to a variable. This is needed in order to populate the Ansible inventory with the correct addresses for each of the hosts to be configured, by replacing the variable names in the hosts.tpl file with their actual values, and outputting to the hosts.ini to be used by Ansible.
# Data sources for outputs and local ansible inventory file data "aws_instances" "nodes" { depends_on = [aws_autoscaling_group.drbd-cluster] instance_tags = { Name = "drbd-cluster-asg" } } # local ansible inventory file resource "local_file" "host-ini" { content = templatefile("hosts.tpl", { nodes = data.aws_instances.nodes.private_ips bastion = aws_instance.bastion.public_ip } ) filename = "hosts.ini" }
Terraform outputs.tf
These are the values that are output after the Terraform configuration is applied. Specifically, we output the public IP of the bastion host, as well as the private IPs of the DRBD nodes. These values can be used in your SSH configurations to set up SSH proxying through the bastion.
Ansible playbook drbd.yaml
The Ansible playbook, drbd.yaml, is where all the configuration of the infrastructure is defined. Ansible playbooks are more procedural than the declarative Terraform configurations, so tasks are ordered in a specific and meaningful way.
First, we set up the “nodes” host group, which are the DRBD nodes, with LINBIT’s PPA for Ubuntu, and installed the DRBD related packages. The drbd-utils package for Ubuntu Focal will blacklist DRBD from multipathd, but in order for it to take effect, multipathd needs to be restarted.
--- - hosts: nodes become: yes tasks: - name : add LINBIT PPA for DRBD packages apt_repository: repo: ppa:linbit/linbit-drbd9-stack state: present - name: install packages apt: name: - thin-provisioning-tools - drbd-dkms - drbd-utils update_cache: yes state: latest - name: restart multipathd after adding drbd blacklist via drbd-utils systemd: name=multipathd state=restarted when: ansible_distribution_release == "Focal"
Next, we write the DRBD configuration using the jinja2 template found in `templates/r0.j2`. This template module will replace all the template variables like hostnames and IP addresses when it places the configuration on the hosts.
Also, before we start configuring the DRBD device or the DRBD devices backing LVM, we check for one to exist. If one does exist, then we can skip all the preparatory tasks that follow by using the `when: not drbd0_stat.stat.exists` conditional statement. This adds a bit of idempotents to the playbook, so if you were to rerun it after having already configured that task, it will skip it.
- name: configure DRBD device template: src=r0.j2 dest=/etc/drbd.d/r0.res register: drbd0_config - name: check for existing DRBD device stat: path: "/dev/drbd0" follow: true register: drbd0_stat
Setup the backing LVM on our EBS volume for DRBD to use as a thin-LVM.
- name: create drbdpool LVM VG lvg: vg: drbdpool pvs: "{{ drbd_backing_disk }}" when: not drbd0_stat.stat.exists - name: create thinpool LVM thinpool lvol: vg: drbdpool thinpool: thinpool size: 99%VG when: not drbd0_stat.stat.exists - name: create r0 LVM thin LV lvol: lv: r0 thinpool: thinpool vg: drbdpool size: 90g when: not drbd0_stat.stat.exists
Finally, the playbook will initialize and start the DRBD device.
- name: drbdadm create-md shell: drbdadm create-md r0 --force >> /root/linbit-drbd-ansible.log when: not drbd0_stat.stat.exists - name: drbdadm up shell: drbdadm up r0 >> /root/linbit-drbd-ansible.log when: not drbd0_stat.stat.exists - name: drbdadm adjust existing nodes shell: drbdadm adjust r0 >> /root/linbit-drbd-ansible.log when: drbd0_stat.stat.exists
One thing to point out is the “drbdadm adjust existing nodes” task. The “when” condition on this task is effectively saying, “If DRBD was already running here, only adjust the DRBD device to update any setting changes in the DRBD config”. An “adjust” on a DRBD resource will apply any changes made in the configuration file that aren’t already in the running configuration. This allows you to add or change tunings in the r0.j2 template, and re-run the playbook to apply them across all nodes. Also, if you enable the ASG processes to replace unhealthy instances, the IP addresses and hostnames of instances will change on replacement, and therefore need adjusting on the existing peers.
After bringing DRBD up, we wait for DRBD to become connected, check if we have “Inconsistent” disk states on all nodes (normal for a fresh cluster with no data), and if we do – skip the initial sync.
- name: wait for DRBD to become fully Connected run_once: true shell: "drbdadm cstate r0 | grep -v 'Connected'" register: connected until: connected.rc != 0 retries: 5 delay: 10 failed_when: "connected.rc !=0 and connected.rc !=1" - name: check for Inconsistent/Inconsistent[/...] data run_once: true shell: "drbdadm dstate r0 | grep -xe '\\(Inconsistent[/]*\\)*'" failed_when: "dsinconsistent.rc !=0 and dsinconsistent.rc !=1" register: dsinconsistent ignore_errors: true - name: skip DRBD initial sync if all data is inconsistent run_once: true shell: drbdadm new-current-uuid r0 --clear-bitmap >> /root/linbit-ans-drbd.log when: dsinconsistent.rc == 0
Deploying the infrastructure and configurations
Now that you have an idea of what’s going on, you can easily create the infrastructure in AWS using the following command:
$ terraform init $ terraform plan $ terraform apply
The init subcommand will install the needed providers listed in the IaC configuration. The plan subcommand will tell you what exactly Terraform plans to do when it’s eventually applied. Finally, the “apply” subcommand will create your infrastructure.
Once your infrastructure has been deployed, you can configure it with Ansible. However, we need to trust the new infrastructure’s SSH keys first. Here is a quick bash command to assist:
$ for n in $(grep -o 172\.16\.10[0-9]*\.[0-9]* hosts.ini); do ansible -a hostname $n; done
With the new nodes trusted, you can run the playbook:
$ ansible-playbook drbd.yaml
Once complete, you can check the status of your DRBD devices by shelling into the cluster instances through the bastion host or using Ansible.
$ ansible -a “drbdadm status” nodes
At this point, you have an unused DRBD replicated block device in Amazon’s AWS. Wh
at you do next is up to you. I often refer to DRBD as the Linux “Swiss Army knife” of replication, so I’m sure you can come up with some relevant use cases.
Additional Comments and Closing Thoughts
The examples in this blog require you to use a recent Ubuntu AMI; however, LINBIT hosts package repositories for all major distros, including Ubuntu, for its commercial support customers to use. In this example, the PAA was used, which is where LINBIT often pushes RC’s for community testing that would not be suitable for production settings (so be careful picking your package updates from our PPA, please!).
Also, if you’re interested in replicating writes out-of-region, the speed of light will quickly become your enemy, and you’ll have to use asynchronous replication to maintain performance. LINBIT has a closed source solution, DRBD Proxy, which enables long-distance real-time replication between regions without impacting performance.
Contact LINBIT for more information, comments, or questions about any of your DRBD clustering needs. You can also join our Slack community to ask questions or provide feedback there: https://linbit.com/join-the-linbit-open-source-community/