io-errors allow file systems and applications to notice and stop themselves or be unmounted. The assumption is that some other node(s) will still have good data and some cluster manager or other monitoring entity will notice the situation and try to bring up services there.
suspend-io allows the problematic incident to be hidden from file systems and applications, they would just block until access to good data is restored. This can be useful to cope with supposedly short "network hickup" like incidents without causing service restarts.
But if IO is "suspended" (blocked, frozen, also known as "un-interruptible sleep" or "D state"), applications cannot be killed, file systems cannot be unmounted.
If some other node was told to take over services meanwhile, we need to demote DRBD on the "frozen" node before we can re-integrate it.
In these scenarios it may be useful to configure DRBD for suspend-io during normal operation, so it would mask intermittent problems, but if services had been taken over by some other partition of nodes in the storage cluster, reconfigure for io-error, to be able to bring down services and unmount file systems before trying to re-integrate this node.
This service reconfigures RESNAME for suspend-io when started and for io-error when stopped.
You should test a lot and maybe talk to LINBIT support before using this.
See also the DRBD User’s Guide
- DRBD User’s Guide