In distributed data storage environments, ensuring data consistency and availability across multiple nodes is crucial. LINSTOR®, the open source software-defined storage (SDS) software created by LINBIT®, is a robust tool for managing data replication and maintaining data integrity both locally for high availability (HA), and across distances for disaster recovery (DR), backup, and other reasons.
This article will give an overview of how you can use LINSTOR to replicate data not only within a cluster in a local data center, but also across distances, for example, a LINSTOR cluster in another data center, or to an AWS S3 or S3-compatible storage bucket.
Terminology
The terms replicating and replication are sometimes used loosely in this article. A strict meaning of data replication is that of a continuous operation. That is, whenever data is written, it is also redundantly copied somewhere else. This is not the case with snapshot shipping which is typically either a manual operation or a scheduled operation done at specified intervals, independently of real-time data write operations.
Sometimes in this article, for example, in the introduction, the terms replicating and replication are used to describe a general operation that results in an identical data set in a system other than the original.
When you might need to use snapshot shipping
LINSTOR is an SDS technology with snapshot shipping integrated into its code. LINSTOR also enables real-time replication by using DRBD®. Managing DRBD is arguably LINSTOR’s core function in most users’ environments. However, real-time replication might not always be the data replication method that you want to use. When you replicate data in real-time, you need your network connection to have very low latency and high bandwidth. This is to ensure that data replication over the network does not become a bottleneck and impact your storage or application performance. When looking to satisfy some data replication needs, as in the following scenarios, you will likely be dealing with data replication that is not done in real time:
- When replicating your data to another data center in disaster recovery scenarios
- When replicating between two LINSTOR clusters in a particular time interval
- When storing your data on S3-compatible storage solutions for backup scenarios
These scenarios typically have low bandwidth and high latency networks, making them a good fit for using LINSTOR’s snapshot shipping feature. The next sections examine each of these scenarios.
Using snapshot shipping for disaster recovery
As part of a typical disaster recovery plan, your data centers need to be located in geographically dispersed areas to reduce their chances of being affected by localized disasters. However, this makes communication between data centers difficult. In such cases, network traffic is usually served with high latency and low bandwidth, due to higher costs for faster networking infrastructure. Over long distances, real-time replication is either not possible or else turns into an expensive solution.
If the RPO and RTO values meet your expectations, instead of replicating the data in real time, you can ship data between two different LINSTOR clusters at specified time intervals, or to another LINSTOR node in that nominally belongs to the same LINSTOR cluster but is physically located in a different data center. This second option, stretching a LINSTOR cluster to include nodes in different data centers, is a topic discussed in an appropriately titled LINBIT® blog article, The Stretched Cluster Disaster Recovery Strategy.
Replicating data between clusters
LINSTOR does real-time data replication between its nodes by using DRBD. However, besides disaster recovery, there might be situations where you want two different LINSTOR clusters to transfer data to each other by means other than DRBD. This might be the case, for example, if you are upgrading systems hardware, or moving your operations, and you want to minimize your downtime. You might also want to transfer data by using snapshots and snapshot shipping for a point-in-time reference for your data, outside of your operating cluster. In such cases, LINSTOR needs to replicate storage resources by means other than in real time. Here, you can use snapshot shipping to transfer data from one LINSTOR cluster to another LINSTOR cluster.
Backup scenarios
Replicating data is not the same as a backing up data. Replication is a continuous operation and always on. It moves in step with changes to your data. Because of this, replication is an essential component of HA. Backing up however is a discrete operation. Making regular data backups is a way of creating copies of your data that you can restore from, should you need to. As such, backup is a critical component in your DR plan.
Backing up data can be important though for reasons other than DR cases. Here, you might imagine a situation where some event, perhaps user error, mangles a data set and you need to recover data from a backup taken at a known good point in time.
To take a backup, you need to freeze the data at a specific point in time and transfer that backup to some other remote and secure storage. Commonly, this remote storage will be object storage, due to its API structure and ease of use. LINSTOR can transfer backups of its resources directly into AWS S3 or S3-compatible object storage, and you can configure these backup transfers to happen automatically at specified time intervals (hourly, daily, weekly, monthly, and so on).
Because snapshot shipping can help you in all of these scenarios, LINSTOR developers have integrated snapshot shipping into the code. With LINSTOR, you can operate from a single command line or GUI whether you are recovering from disaster scenarios, configuring cluster replication, or managing your backup needs. This is an improvement on having to rely on different pieced-together solutions.
Using LINSTOR for snapshot shipping
To use LINSTOR to ship storage resource snapshots, you first need to create a “remote” definition in LINSTOR. The remote definition can be either an AWS S3 or S3-compatible storage or another LINSTOR cluster.
Creating a remote LINSTOR target for snapshot shipping
To create an S3 target named myRemote
:
# linstor remote create s3 myRemote s3.us-west-2.amazonaws.com my-bucket us-west-2 access_key secret_key
Or to create a remote LINSTOR cluster named myRemote
:
# linstor remote create linstor myRemote 192.168.0.15
Shipping a LINSTOR resource snapshot
Next, after creating your remote shipping target in LINSTOR, all you have to do is start the shipping process by entering the following command:
# linstor backup create myRemote <linstor-resource-name>
For more details about shipping snapshots in LINSTOR, refer to the LINSTOR User Guide.
The technology behind LINSTOR snapshot shipping
If the LINSTOR storage pool that your resource belongs to has a ZFS back end, the snapshot feature uses ZFS built-in send and receive utilities. For storage pools that have a thin-provisioned LVM back end, the LINSTOR snapshot feature uses the lesser known LINBIT created and developed open source thin-send-recv
utility.
Shipping snapshots by using LINSTOR in a Kubernetes environment
If you’re using LINSTOR as your storage provider for Kubernetes, my colleague Matt Kereczman’s blog article titled Abstracting Persistent Storage Across Environments With LINBIT SDS, explains how snapshots and snapshot shipping work in a Kubernetes environment.
Why snapshot shipping is efficient
Snapshot shipping is a bit of a loosely used term. Actually, it means the complete storage volume is transferred (in all relevant cases, this is a thin-provisioned volume, and only the parts of the volume that have real data are transferred). For efficiency, after the initial shipment of the complete volume, only the differences between the subsequent snapshots are transferred.
Expressed differently, the full snapshot is shipped once, and only the incremental changes are shipped afterwards. It is a good idea to periodically take and ship full snapshots, so you do not have to rely on dozens of incremental snapshots when the time comes to restore a resource. These are commonplace concepts in the data backup and recovery world.
A second important aspect is that a flaky network connection should not influence I/O performance on the active volume. A snapshot delta (a snapshot of changes since the last snapshot) gets applied into a new snapshot on the target, so an interrupted transfer leaves the previous snapshot untouched in place.
Conclusion
Hopefully this article has introduced you to snapshot shipping and why it can be an important part of a total data storage solution. By using LINSTOR to create, ship, and restore data storage snapshots, you get the benefit of using the same software tool that can already handle real-time replication for high availability in your cluster. You do not need to go out and source and integrate a separate technology into your systems, for disaster recovery, backup, and other use cases where snapshot shipping can help.
Changelog
2021-12-13:
- Originally written
2024-05-08:
- Made language updates and improvements
2024-11-25:
- Made language and article structure improvements for clarity
- Fixed
linstor remote create LINSTOR
command