One of the reasons that LINBIT® uses an open source licensing model1 for nearly all of the software that LINBIT developers create is because the open source model encourages collaboration and engagement with users, in a way that closed source software does not. With open source software, the software code is freely available for anyone to access. If you want to understand how the software works, there are no barriers, besides the limits of your own knowledge and experience with code. If you use the software, and you want to either customize it to your own use case, or improve it, you can also do that, again without barriers. This was the case earlier in 2023 with a community user of the LINBIT-developed open source software-defined storage (SDS) solution for Kubernetes. This blog post will highlight that user, Yellowbrick Data, and how its team worked to improve LINBIT’s software within an open source framework.
Two Yellowbrick team members, Miki Grof-Tisza and Mike Panchenko, graciously took time out of their schedules to answer some questions about how Yellowbrick uses LINBIT open source technologies and how Yellowbrick patches to the open source code found their way back to the community. Their detailed and thoughtful answers made this blog article possible.
The Use Case for LINBIT Open Source Software for Kubernetes
Can you describe the Yellowbrick use case for LINBIT open source technologies?
“Here at Yellowbrick we are building a multi-cloud massively parallel processing (MPP) database running on top of Kubernetes. To support the process of building and testing our product, we needed a private cloud environment running on top of bare metal. We took 300+ custom-designed Yellowbrick servers, put them in racks, and installed Kubernetes to manage resources and workloads across this fleet of servers.
“One of the integral parts of almost any Kubernetes environment is Persistent Volumes (PVs) and this is where we chose Piraeus Operator2, LINSTOR®, and DRBD®. We use PVs to store database catalog data, logs, metrics, various types of repositories, and other types of data that should exist beyond the Pod life cycle. LINSTOR currently manages multiple servers with NVMe disk drives and approximately 500TB of raw capacity.
“Today we strictly divide the roles of servers in our cloud into storage nodes and worker nodes. Storage nodes run only the Piraeus software set and store copies of data volumes on local disk drives. Worker nodes run random workloads and use DRBD’s Diskless mode to remotely mount PVs from storage nodes. This configuration gives us quite good flexibility in distributing the workload across the entire fleet of nodes and without tying Pods only to the nodes storing a copy of the volume.”
Why Yellowbrick Chose LINBIT Open Source Software
Can you describe why Yellowbrick chose to use LINBIT software over other potential software solutions?
“We chose Piraeus Operator + LINSTOR + DRBD for several reasons, some of them:
- We have been using DRBD in our products for quite some time and are generally satisfied with the operating experience.
- We like that it is an open source product, which gives us the ability to be more agile in achieving our goals.
- This solution is more cost-effective.”
The RDMA Transport For DRBD
Since DRBD version 9.2.0, the
drbd_transport_rdma kernel module is available as open source code.
This transport uses RDMA to move data over RDMA-capable hardware such as InfiniBand HCAs, iWARP capable NICs, or RoCE capable NICs. In contrast to the TCP/IP transport protocol, RDMA allows DRBD traffic to happen with very little CPU involvement.
The Use Case for the DRBD RDMA Transport at Yellowbrick
How is the RDMA transport protocol important to the Yellowbrick use case for LINBIT software?
“Its primary contribution lies in offering improved latency compared to the TCP transport, consequently enhancing overall bandwidth. By leveraging RDMA, we achieve lower latencies in data transfers, leading to a boost in the efficiency of our network.”
Issues With the RDMA Transport Protocol
There were some issues that Yellowbrick encountered when using the RDMA transport protocol for DRBD. Can you describe what those issues were?
“In an endeavor to enable the RDMA transport protocol for our setup, I [Miki Grof-Tisza] encountered several critical issues affecting stability and functionality. These challenges primarily arose from changes in the NVIDIA MLX4 ethernet driver‘s callback behavior, which shifted to a hard interrupt context.”
Resolving Issues Within an Open Source Framework
How did you resolve those issues?
“To address these challenges systematically, I adopted a methodical approach. I created simple reproductions for each issue encountered and delved into the [Linux] kernel driver to identify the root cause.
“One significant issue stemmed from the instability caused by invoking the
drbd_control_data_ready() function from a hard interrupt context, triggered by changes in the MLX4 driver. Previous versions of the driver employed softirq context for callbacks, but the shift to hardirq context disrupted this aspect of the RDMA transport driver. To resolve this, we transitioned the function call to softirq context using a tasklet, ensuring compatibility across different drivers and preventing system instability. Additionally, the context changes introduced race conditions for locking mechanisms and reference counting. These were addressed by implementing an additional spinlock to safeguard against these race conditions.
“Another critical issue emerged due to the removal of the
ack-receiver thread in DRBD 9.2, resulting in flow control deadlocks. Resolving this deadlock involved sending control messages from the softirq context before invoking the
drbd_control_data_ready() function, ensuring uninterrupted communication with peers.”
The Benefits of Open Source Software Participation
Can you describe what benefits companies and organizations might get from being involved in open source projects?
“Participation in open source offers companies advantages, including accelerated innovation through collaborative development, reduced costs via shared solutions, enhanced software quality and security through global scrutiny, increased customization flexibility, and improved interoperability. This involvement allows access to diverse talent pools, rapid iteration cycles, and the ability to tailor solutions to specific needs, ultimately fostering a robust and dynamic technological ecosystem.”
Why does Yellowbrick specifically choose to use open source technologies?
“Yellowbrick prioritizes the use of open source technologies primarily for their ability to expedite our time to market while simultaneously controlling costs. Leveraging open source solutions allows us to accelerate our development cycles by tapping into existing, well-supported frameworks and libraries. This access to established tools and resources enables us to build upon reliable foundations, significantly reducing the time required for initial development. Moreover, the cost-efficiency inherent in open source technologies aligns with our goal of optimizing resources without compromising the quality or scalability of our products and services. By harnessing the power of open source, we strike a balance between rapid innovation and prudent cost management, ultimately driving our competitive edge in the market.”
With diligence, ingenuity, and care, Miki Grof-Tisza of Yellowbrick made five commits to the DRBD project‘s GitHub-hosted codebase. There was also one commit that the LINBIT development team made that was a result of an issue that Miki Grof-Tisza reported. These commits were made in the spring of 2023 and directly improved the DRBD RDMA transport protocol for all DRBD users.
To learn more about the specific commits, you can clone the DRBD project locally (the size of the project is around 23M) and run a couple of
git log commands within your local clone of the project:
$ git log --author="Miki Grof-Tisza"
$ git log --grep=Reported-by\:\ Miki\ Grof-Tisza
The LINBIT team is grateful for Yellowbrick’s code and issue reporting contributions to the DRBD project. The LINBIT team welcomes other contributions from the wide base of users of LINBIT open source software and its documentation. Whether a contribution comes in the form of an issue report, a feature request, or a code commit request, each contribution helps to improve the usefulness and robustness of LINBIT SDS solutions for all users. It is through working in collaboration that the open source community is changing the world of technology.
1: LINBIT typically releases its software under either an Apache-2.0 or GPL-2.0 license.
2: Piraeus is the name of the LINBIT-developed upstream open source datastore project which can provide persistent software-defined storage for Kubernetes deployments.