In a performance test LINBIT measured 14.8 million IOPS on a 12 node cluster built from standard off-the-shelf Intel servers. This is the highest storage performance reached by a hyper-converged system on the market, for this hardware basis. Even a small LINBIT storage system can provide millions of IOPS at latencies of a fraction of a millisecond. For real-world applications, these figures correspond to outstanding application performance.
LINBIT chose this setup because our competitors have published test results from equivalent systems. So it is easy to compare the strengths of each software offering with fair conditions and the same environment. We worked hard to get the most of the system and made it! Microsoft managed to reach 13.7 million IOPS, and Storpool marginally topped that with 13.8 million IOPS. We reached 14.8 million remote read IOPS – a significant jump of 7.2%! “Those performance numbers mark a milestone in the development of our software. The results prove we speed up High Availability at a large scale”, says CEO Philipp Reisner. The numbers would scale up even further with a larger setup.
These exciting results are for 3-way synchronous replication using DRBD. The test cluster was provided through the Intel®️ Data Center Builders program. It consists of 12 servers, each running 8 instances of the benchmark, making a total of 96 instances. The setup is hyper-converged, meaning that the same servers are used to run the benchmark and to provide the underlying storage.
For some benchmarks, one of the storage replicas is a local replica on the same node as the benchmark workload itself. This is a particularly effective configuration for DRBD.
DRBD provides a standard Linux block device, so it can be used directly from the host, from a container, or from a virtual machine. For these benchmarks, the workload runs in a container, demonstrating the suitability of LINBIT’s SDS solution, which consists of DRBD and LINSTOR, for use with Kubernetes.
IOPS and bandwidth results are the totals from all 96 workload instances. Latency results are averaged.
Let’s look into the details!
This was achieved with a 4K random write benchmark with an IO depth of 64 for each workload. The setup uses Intel® Optane™ DC Persistent Memory to store the DRBD metadata. The writes are 3-way replicated with one local replica and two remote replicas. This means that the backing storage devices are writing at a total rate of 15 million IOPS.
This was achieved with a 4K random write benchmark with serial IO. That is, an IO depth of 1. This means that the writes were persisted to all 3 replicas within an average time of only 85μs. DRBD attained this level of performance both when one of the replicas was local and when all were remote. The setup also uses Intel® Optane™ DC Persistent Memory to store metadata.
This was achieved with a 4K random read benchmark with an IO depth of 64. This corresponds to 80% of the total theoretical network bandwidth of 75GB/s. This result was reproduced without any usage of persistent memory so that the value can be compared with those from our competitors.
Representing a more typical real-world scenario, this benchmark consists of 70% reads and 30% writes and used an IO depth of 64. One of the 3 replicas was local.
DRBD is optimized for persistent memory. When the DRBD metadata is stored on an NVDIMM, write performance is improved.
When the metadata is stored on the backing storage SSD with the data, DRBD can process 4.5 million write IOPS. This increases to 5.0 million when the metadata is stored on Intel® Optane™ DC Persistent Memory instead, an improvement of 10%.
Moving the metadata onto persistent memory has a particularly pronounced effect on the write latency. This metric plummets from 113μs to 85μs with this configuration change. That is, the average write is 25% faster.
Below are the full results for DRBD running on the 12 servers with a total of 96 benchmark workloads.
The IOPS and MB/s values have been rounded down to 3 significant figures.
All volumes are 500GiB in size, giving a total active set of 48,000GiB and consuming a total of 144,000GiB of the underlying storage. The workloads are generated using the fio tool with the following parameters:
In order to ensure that the results are reliable, the following controls were applied:
The following key software components were used for these benchmarks:
In this text and at LINBIT, in general, we use the expression 2 replicas to indicate that the data is stored on 2 storage devices. For these tests, there are 3 replicas, meaning that the data is stored on 3 storage devices.
In other contexts, the expression 2 replicas might mean one original plus 2 replicas. That would mean that data would be stored on 3 storage devices.
These results were obtained on a cluster of 12 servers made available as part of the Intel® Data Center Builders program. Each server was equipped with the following configuration:
The servers were all connected in a simple star topology with a 25Gb switch.
Storage has often been a bottleneck in modern IT environments. The two requirements speed and high availability have always been in competition. If you aim for maximum speed, the quality of the high availability tends to suffer and vice versa. But with this performance test we demonstrate the best-of-breed open source software-defined storage solution. A replicated storage system that combines high availability and the performance of local NVMe drives is now possible.
This technology enables any public and private cloud builder to deliver high performance for their applications, VMs and containers. If you aim to build a powerful private or public cloud, our solution meets your storage performance needs.
If you want to learn more or have any questions, do contact us at email@example.com
Our LINBIT Community Meeting has started: join us now neither via youtube live or zoom:
LINBIT Community Meeting