We often see people on #drbd
or on drbd-user
trying to measure the performance of their setup. Here are a few best practices to do this.
First, a few facts.
- The synchronization rate shown in
/proc/drbd
has nothing to do with the replication rate. These are different things, don’t mistake thespeed:
value there for a performance indicator. - Use an appropriate tool.
dd
with default settings andcp
don’t write to the device, but only into the Linux buffers at first – so timing these won’t tell you anything about your storage performance. - The hardest discipline is single-threaded, io-depth 1. Here every access has to wait for the preceding to finish, so each bit of latency will bite you hard.
Getting some bandwidth with four thousand concurrent writes is easy! - Network benchmarking isn’t that easy, either.
iperf
will typically send only NULs; checksum offloading might hide or create problems; switches, firewalls, etc. will all introduce noise.
What you want to do:
- Start at the bottom of the stack. Measure (and tune) the LV that DRBD® will sit upon, then the network, then DRBD.
- Our suggestion is still to use a direct connection, ie. a crossover cable.
- If you don’t have any data on the device, test against the block device. A filesystem on top will create additional meta-data load and barriers, this can severely affect your IOPs. (Especially on rotating media.)
- Useful tools are
fio direct=1
, and for a basic single-threaded io-depth=1 run you can usedd oflag=direct
(for writes, when reading setiflag
).dd
withbs=4096
is nice to measure the IOPs,bs=1M
will give you the bandwidth. - Get enough data. Running
dd
with settings that make it finish within 0.5 seconds means that you are likely to suffer from outliers, make it run 5 seconds or longer!fio
has the niceruntime
parameter, just let it run 20 seconds to have some data. - For any unexpected result try to measure again a minute later, then think hard what could be wrong and where your clusters bottlenecks are.
Some problems we’ve seen:
- Misaligned partitions (sector 63, anyone?) might hurt you plenty. Really.
If you suffer from that, get the secondary correctly aligned, switch over, and re-do the previous primary node. iperf
goes fast, but a connected DRBD doesn’t: try turning off the offloading on the network cards; some will trash the checksum for non-zero data, and that means retransmissions.- Some RAID controllers can be tuned – to either IOPs or bandwidth. Sounds strange, but we have seen such effects.
- Concurrent load – trying to benchmark the storage on your currently active database machine is not a good idea.
- Broken networks should be looked for even if there are no error counters on the interface. Recently a pair started to connect just fine, but then couldn’t even synchronize with a meagre 10MiByte/sec…
The best hint was theethtool
output that saidSpeed: 10MBit
; switching cables did resolve that issue.
If you’re doing all that correctly, and are using a recent DRBD version, for a pure random-write IO you should only see 1-3% difference between the lower-level LV directly and a connected DRBD.
Here’s an example fio call.
fio --name $name --filename $dev --ioengine libaio --direct 1 \
--rw randwrite --bs 4k --runtime 30s --numjobs $threads \
--iodepth $iodepth --append-terse