In our first look into LINSTOR you learned a lot about the single communication protocol, transaction-safety and modularity features. In the next chapter you can dive deeper into the construction.
Keeping the software responsive is one of the more difficult problems that we have to deal with in LINSTOR’s design and implementation. The Controller/Satellite split is one fundamental part of LINSTOR’s design toward fault tolerance, but there are many other design and implementation details that improve the software’s robustness, and many of them are virtually invisible to the user.
On the Controller side, communication and persistence are the two main areas that can lead to the software becoming unresponsive. The following problems could lead to an unusable network communication service on the Controller side:
- Stopping or reconfiguring a network interface
- Address conflicts
- In-use TCP/IP ports
All network I/O in LINSTOR is non-blocking, so that unresponsive network peers do not lead to a lockup of LINSTOR’s network communication service. While the network communication service has been designed to recover from many kinds of problems, it additionally allows the use of multiple independent network connectors, so that the system remains accessible even in the case where a network connector requires reconfiguration to recover. The network connectors can also stop and start independently, allowing reinitialization of failed connectors.
The Controller can obviously not continue normal operation while the database service is inoperative, which could of course happen if an external database is used, for example, due to a downtime of the database server or due to a network problem. Once the database service becomes available again, the Controller will recover automatically, without requiring any operator intervention.
Satellites in LINSTOR
The Satellite side of LINSTOR does not run a database, and a single unresponsive Satellite is less critical for the system as a whole than an unresponsive Controller. Nonetheless, if a Satellite encounters a failure during the configuration of one storage resource, that should still not temporarily prevent it from being able to service requests for the configuration of other resources.
The biggest challenge regarding fault tolerance on the Satellite side is the fact that the Satellite interacts with lots of external programs and processes that are neither part of LINSTOR nor under the direct control of the Satellite process. These external components include system utilities required for the configuration of backend storage, such as LVM or ZFS commands, processes observing events generated by the DRBD kernel module whenever the state of a resource changes, block device files that appear or disappear when storage devices are reconfigured, and similar kinds of objects.
To achieve fault tolerance on the Satellite side, the software has been designed to deal with many possible kinds of malfunctions of the external environment that LINSTOR interacts with. This includes the time-boxing and the enforcement of size limits on the amount of data that is read back when executing external processes, as well as recovery procedures that attempt to abort external processes that have become unresponsive. There is even a fallback that reports a malfunctioning operating system kernel if the operating system is unable to end an unresponsive process. The LINSTOR code also contains a mechanism that can run critical operations, such as the attempt to open a device file ( which may block forever due to faulty operating system drivers) asynchronously, so that even if the operation blocks, LINSTOR would normally at least be able to detect and report the problem.
With feature richness, customizability and flexibility, also comes complexity. The only thing that can be done to make the system as easy to understand and use as possible is to attempt to make the system intuitive, self-explaining and unambiguous.
Clarity in the naming scheme of objects turned out to be an important factor for a user’s ability to use the software intuitively. In our previous product, drbdmanage, users would typically look for commands to either create a “resource” or a “volume.” However, the corresponding commands, “new-resource” and “new-volume”, only define a resource and its volumes, but do not actually create storage resources on any of the cluster nodes. Another command, “assign”, was required to assign the resource to cluster nodes, thereby creating the actual storage resource, and users sometimes had a hard time finding this command.
For this reason, the naming of objects was changed in LINSTOR. A user looking for a command to create a resource will find the command that actually creates a storage resource, and one of the required parameters for this command is the so-called resource definition. It is quite obvious that the next step would be to look for a command that creates a resource definition. This kind of naming convention is supposed to make it easier for users to figure out how to intuitively use the application.
LINSTOR is also explicit with replies to user commands, as well as with return codes for API calls. The software typically replies with a message that describes whether or not the command was successful, what the software did, and to which objects the message refers. Error messages that include a description of the problem cause or hints for possible correction measures also follow a uniform structure.
Similar ideas also applies to return codes, which include not only the error code (e.g., Object exists), but also information on what objects the error refers to (e.g., the type of object and the identifier specified by the user).
To make diagnosing errors easier, LINSTOR also generates a unique identifier for every error that is logged. The traditional logging and error reporting on Unix/Linux systems basically consists of single text lines logged to one large logfile, sometimes even a single logfile for many different applications. An application could log multiple lines for each error, but support for logging multiple lines atomically (instead of interleaved with log lines for other errors, possibly from other applications) is virtually nonexistent.
For this reason, LINSTOR logs a single-line short description of the error, including the error identifier, to the system log, but also logs the details of the error to a report file that can be found using the error identifier. The detailed log report also contains information such as the component where the error occured, the exact version of the software that was used, debug information, nested errors, and many other details that may help with problem mitigation.
While the various design characteristics are important factors for creating a powerful and robust software system, even the best design cannot produce a reliable application if it is not implemented with high quality.
The first step, even before we wrote the code, was to choose a programming language that would be suitable for the task. While our previous product, drbdmanage, and the current LINSTOR client are implemented in Python, the LINSTOR server-side components (the Controller and Satellite) are implemented in Java. A server application that manages highly available storage systems should obviously be designed and implemented much more carefully than the typical single-user desktop application. Java is a very strict programming language that provides strong static typing, checked exceptions and allows only few implicit type conversions – which are all features that also enable IDEs to perform static checking of the code while it is being written.
Obviously, while it can make writing high quality code easier, the choice of programming language alone does not automatically lead to better code. To keep LINSTOR’s code clean, readable, self-explaining and maintainable, we apply many of the best practices that have proven successful in the creation of mission-critical software systems. This includes more important things like choosing descriptive variable names or maintaining a clear and logical control flow, but even extends to less technical details like consistent formatting of the source code. The coding standard that we apply to produce high-quality code is based on standards from the aviation industry and is among the strictest coding standards that exist today.
Easy Validity Checks
There is also a strong focus on correctness and strict checking in the way LINSTOR is implemented. As an example, the name of objects like nodes, resources or storage pools is not simply a String, but an object that can only be constructed with a name that is valid for that kind of object. It is impossible to create a resource name object that contains invalid characters, or to accidentally use a resource name object as the identifier for the creation of a storage pool. As a result, developers cannot forget to perform a validity check on a node name or on a volume number, and they also cannot apply the wrong check by accident.
All those considerations, design characteristics and implementation methods are important factors that helped us create a dependable and user friendly software that we hope will prove useful and valuable to its users like you.
If you have any questions or suggestions concerning LINSTOR, please leave a comment or write email to firstname.lastname@example.org .