LINBIT featured image

Benchmarking DRBD

We often see people on #drbd or on drbd-user trying to measure the performance of their setup. Here are a few best practices to do this. 

First, a few facts.

  • The synchronization rate shown in /proc/drbd has nothing to do with the replication rate. These are different things, don’t mistake the speed: value there for a performance indicator.
  • Use an appropriate tool. dd with default settings and cp don’t write to the device, but only into the Linux buffers at first – so timing these won’t tell you anything about your storage performance.
  • The hardest discipline is single-threaded, io-depth 1. Here every access has to wait for the preceding to finish, so each bit of latency will bite you hard.
    Getting some bandwidth with four thousand concurrent writes is easy!
  • Network benchmarking isn’t that easy, either. iperf will typically send only NULs; checksum offloading might hide or create problems; switches, firewalls, etc. will all introduce noise.

What you want to do is this:

  1. Start at the bottom of the stack. Measure (and tune) the LV that DRBD will sit upon, then the network, then DRBD.
  2. Our suggestion is still to use a direct connection, ie. a crossover cable.
  3. If you don’t have any data on the device, test against the block device. A filesystem on top will create additional meta-data load and barriers, this can severely affect your IOPs. (Especially on rotating media.)
  4. Useful tools are fio direct=1, and for a basic single-threaded io-depth=1 run you can use dd oflag=direct (for writes, when reading set iflag).
    dd with bs=4096 is nice to measure the IOPs, bs=1M will give you the bandwidth.
  5. Get enough data. Running dd with settings that make it finish within 0.5 seconds means that you are likely to suffer from outliers, make it run 5 seconds or longer!
    fio has the nice runtime parameter, just let it run 20 seconds to have some data.
  6. For any unexpected result try to measure again a minute later, then think hard what could be wrong and where your clusters bottlenecks are.

Some problems that we’ve seen in the past are:

  • Misaligned partitions (sector 63, anyone?) might hurt you plenty. Really.
    If you suffer from that, get the secondary correctly aligned, switch over, and re-do the previous primary node.
  • iperf goes fast, but a connected DRBD doesn’t: try turning off the offloading on the network cards; some will trash the checksum for non-zero data, and that means retransmissions.
  • Some RAID controllers can be tuned – to either IOPs or bandwidth. Sounds strange, but we have seen such effects.
  • Concurrent load – trying to benchmark the storage on your currently active database machine is not a good idea.
  • Broken networks should be looked for even if there are no error counters on the interface. Recently a pair started to connect just fine, but then couldn’t even synchronize with a meagre 10MiByte/sec…
    The best hint was the ethtool output that said Speed: 10MBit; switching cables did resolve that issue.

If you’re doing all that correctly, and are using a recent DRBD version (please, don’t come whining about DRBD 8.0.16 performance! ;), for a pure random-write IO you should only see 1-3% difference between the lower-level LV directly and a connected DRBD.

Update: here’s an example fio call.

fio --name $name --filename $dev --ioengine libaio --direct 1 \
   --rw randwrite --bs 4k --runtime 30s --numjobs $threads \
   --iodepth $iodepth --append-terse

Like? Share it with the world.

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp
Share on vk
Share on reddit
Share on email