LINBIT featured image

DRBD and the sync rate controller, part 2

As an update to the earlier blog post, take a look below.

As a reminder: this is about resynchronization (ie. recovery after a node or network problem), not about the replication.

If you’ve got a demanding application it’s possible that it completely fills your I/O bandwidth, disk and/or network, leaving no room for the synchronization to complete. To make the synchronization slow down and let the application proceed, DRBD has the dynamically adaptive resync rate controller.

It is enabled by default with 8.4, and disabled by default with 8.3.
To explicitly enable or disable, set c-plan-ahead to 20 (enable) or 0 (disable).

Note that, while enabled, the setting for the old fixed sync rate is used only as initial guess for the controller. After that, only the c-* settings are used, so changing the fixed sync rate while the controller is enabled won’t have much effect.

What it does

The resync controller tries to use up as much network and disk bandwidth as it can get, but no more than c-max-rate, and throttles if either

  • more resync requests are in flight than what amounts to c-fill-target [1. Or, if c-fill-target is set to 0, if the current estimated response delay from the peer is more than c-delay-target]
  • it detects application IO (read or write), and the current estimated resync rate is above c-min-rate[1. Unless c-min-rate is 0.].

The default c-min-rate with 8.4.x is 250 kiB/sec (the old default of the fixed sync-rate), with 8.3.x it was 4MiB/sec.

This “throttle if application IO is detected” is active even if the fixed sync rate is used. You can (but should not, see below) disable this specific throttling by setting c-min-rate to 0.

Tuning the resync controller

It’s hard, or next to impossible, for DRBD to detect how much activity your backend can handle. But it is very easy for DRBD to know how much resync-activity it causes itself.
So, you tune how much resync-activity you allow during periods of application activity.

To do that you should

  • set c-plan-ahead to 20 (default with 8.4), or more if there’s a lot of latency on the connection (WAN link with protocol A);
  • leave the fixed resync rate (the initial guess for the controller) at about 30% or less of what your hardware can handle;
  • set c-max-rate to 100% (or slightly more) of what your hardware can handle;
  • set c-fill-target to the minimum (just as high as necessary) that gets your hardware saturated, if the system is otherwise idle.
    Respectively, figure out the maximum possible resync rate in your setup while the system is idle, then set c-fill-target to the minimum setting that still reaches that rate.
  • And finally, while checking application request latency/responsiveness, tune c-min-rate to the maximum that still allows for acceptable responsiveness.

Most parts of this post were originally published as an ML post by Lars.

Like? Share it with the world.

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on vk
Share on reddit
Share on email