When it’s time for the satellite to leave the cluster, the controller takes action!
LINBIT® introduced LINSTOR®’s auto-evict feature starting with LINSTOR version 1.10. In simple words, the feature evicts one of the satellite nodes from the cluster. But the possibilities that the feature opens up for you, including creating a self-healing cluster, are more interesting than you think.
As you may know, LINSTOR uses DRBD® for creating replicas of your data. DRBD doesn’t distribute the data but creates one-to-one real-time replicas that are the same on each of your cluster nodes. This is perfect for high-availability use cases.
Before Auto-evict
You never want node failure in a cluster but you should always expect and prepare for it because a node may fail and not come back online for many reasons.
Before the auto-evict feature, if a node failed, the controller would wait until the node connection was restored and would prevent you from modifying or deleting the resources on that node. Also, before auto-evict, when a node failed or went offline, LINSTOR did not automatically create another copy of your resources to match the desired number of replicas within your cluster.
After Auto-evict
Now, with the auto-evict feature configured, you can set a timer for a node that isn’t communicating with the cluster to be evicted from the cluster. When the timer expires, LINSTOR marks that node as “evicted” and then triggers automated reassignment of the affected DRBD resources to other nodes to keep a minimum replica count that you configured for your resources. By using the auto-evict feature and configuring minimum replica counts for your resources, you can make your cluster self-healing after a node failure.
After LINSTOR evicts a node from the cluster, you have several options. Two of these used to be panic and scramble. But not anymore.
Evicting a node allows you to modify the resources freely, because LINSTOR will otherwise not allow you to modify a resource without being connected to the LINSTOR controller. An evicted node is no longer connected to the controller so that you no longer have this constraint.
You can fix the issue with the node and make it available within your cluster again. Just be aware that the cluster will not accept the returning node automatically. Unless you use a cluster manager such as DRBD Reactor or Pacemaker, you will need to use the node restore
command.
Or, if you used LINSTOR resource groups to configure auto-placement of your resources and you want to give the returning node a fresh start, you can use the node lost
command to delete all LINSTOR resources and configurations on that node.
Conclusion
Auto-evict is “can’t live without it” feature for better control of your LINSTOR cluster and allows you to manage node failures more efficiently than you could before.
For more configuration details and technical information about auto-evict, the node restore
, and the node lost
commands, check out the auto-evict section in the LINSTOR User’s Guide.