DRBDSETUP(8) | System Administration | DRBDSETUP(8) |
drbdsetup new-resource resource [--cpu-mask {val}] [--on-no-data-accessible {io-error | suspend-io}]
drbdsetup new-minor resource minor volume
drbdsetup del-resource resource
drbdsetup del-minor minor
drbdsetup attach minor lower_dev meta_data_dev meta_data_index [--size {val}] [--max-bio-bvecs {val}] [--on-io-error {pass_on | call-local-io-error | detach}] [--fencing {dont-care | resource-only | resource-and-stonith}] [--disk-barrier] [--disk-flushes] [--disk-drain] [--md-flushes] [--resync-rate {val}] [--resync-after {val}] [--al-extents {val}] [--al-updates] [--discard-zeroes-if-aligned] [--disable-write-same] [--c-plan-ahead {val}] [--c-delay-target {val}] [--c-fill-target {val}] [--c-max-rate {val}] [--c-min-rate {val}] [--disk-timeout {val}] [--read-balancing {prefer-local | prefer-remote | round-robin | least-pending | when-congested-remote | 32K-striping | 64K-striping | 128K-striping | 256K-striping | 512K-striping | 1M-striping}] [--rs-discard-granularity {val}]
drbdsetup connect resource local_addr remote_addr [--tentative] [--discard-my-data] [--protocol {A | B | C}] [--timeout {val}] [--max-epoch-size {val}] [--max-buffers {val}] [--unplug-watermark {val}] [--connect-int {val}] [--ping-int {val}] [--sndbuf-size {val}] [--rcvbuf-size {val}] [--ko-count {val}] [--allow-two-primaries] [--cram-hmac-alg {val}] [--shared-secret {val}] [--after-sb-0pri {disconnect | discard-younger-primary | discard-older-primary | discard-zero-changes | discard-least-changes | discard-local | discard-remote}] [--after-sb-1pri {disconnect | consensus | discard-secondary | call-pri-lost-after-sb | violently-as0p}] [--after-sb-2pri {disconnect | call-pri-lost-after-sb | violently-as0p}] [--always-asbp] [--rr-conflict {disconnect | call-pri-lost | violently}] [--ping-timeout {val}] [--data-integrity-alg {val}] [--tcp-cork] [--on-congestion {block | pull-ahead | disconnect}] [--congestion-fill {val}] [--congestion-extents {val}] [--csums-alg {val}] [--csums-after-crash-only] [--verify-alg {val}] [--use-rle] [--socket-check-timeout {val}]
drbdsetup disk-options minor [--on-io-error {pass_on | call-local-io-error | detach}] [--fencing {dont-care | resource-only | resource-and-stonith}] [--disk-barrier] [--disk-flushes] [--disk-drain] [--md-flushes] [--resync-rate {val}] [--resync-after {val}] [--al-extents {val}] [--al-updates] [--discard-zeroes-if-aligned] [--disable-write-same] [--c-plan-ahead {val}] [--c-delay-target {val}] [--c-fill-target {val}] [--c-max-rate {val}] [--c-min-rate {val}] [--disk-timeout {val}] [--read-balancing {prefer-local | prefer-remote | round-robin | least-pending | when-congested-remote | 32K-striping | 64K-striping | 128K-striping | 256K-striping | 512K-striping | 1M-striping}] [--rs-discard-granularity {val}]
drbdsetup net-options local_addr remote_addr [--protocol {A | B | C}] [--timeout {val}] [--max-epoch-size {val}] [--max-buffers {val}] [--unplug-watermark {val}] [--connect-int {val}] [--ping-int {val}] [--sndbuf-size {val}] [--rcvbuf-size {val}] [--ko-count {val}] [--allow-two-primaries] [--cram-hmac-alg {val}] [--shared-secret {val}] [--after-sb-0pri {disconnect | discard-younger-primary | discard-older-primary | discard-zero-changes | discard-least-changes | discard-local | discard-remote}] [--after-sb-1pri {disconnect | consensus | discard-secondary | call-pri-lost-after-sb | violently-as0p}] [--after-sb-2pri {disconnect | call-pri-lost-after-sb | violently-as0p}] [--always-asbp] [--rr-conflict {disconnect | call-pri-lost | violently}] [--ping-timeout {val}] [--data-integrity-alg {val}] [--tcp-cork] [--on-congestion {block | pull-ahead | disconnect}] [--congestion-fill {val}] [--congestion-extents {val}] [--csums-alg {val}] [--csums-after-crash-only] [--verify-alg {val}] [--use-rle] [--socket-check-timeout {val}]
drbdsetup resource-options resource [--cpu-mask {val}] [--on-no-data-accessible {io-error | suspend-io}]
drbdsetup disconnect local_addr remote_addr [--force]
drbdsetup detach minor [--force]
drbdsetup primary minor [--force]
drbdsetup secondary minor
drbdsetup down resource
drbdsetup verify minor [--start {val}] [--stop {val}]
drbdsetup invalidate minor
drbdsetup invalidate-remote minor
drbdsetup wait-connect minor [--wfc-timeout {val}] [--degr-wfc-timeout {val}] [--outdated-wfc-timeout {val}] [--wait-after-sb {val}]
drbdsetup wait-sync minor [--wfc-timeout {val}] [--degr-wfc-timeout {val}] [--outdated-wfc-timeout {val}] [--wait-after-sb {val}]
drbdsetup role minor
drbdsetup cstate minor
drbdsetup dstate minor
drbdsetup resize minor [--size {val}] [--assume-peer-has-space] [--assume-clean] [--al-stripes {val}] [--al-stripe-size-kB {val}]
drbdsetup check-resize minor
drbdsetup pause-sync minor
drbdsetup resume-sync minor
drbdsetup outdate minor
drbdsetup show-gi minor
drbdsetup get-gi minor
drbdsetup show {resource | minor | all}
drbdsetup suspend-io minor
drbdsetup resume-io minor
drbdsetup status {resource | all} [--color {val}]
drbdsetup events2 {resource | all}
drbdsetup events {resource | minor | all}
drbdsetup new-current-uuid minor [--clear-bitmap]
--create-device
A pair of replicated block devices may have different minor numbers on the two machines. They are associated by a common volume-number. Volume numbers are local to each connection. Minor numbers are global on one node.
With the disk-options command it is possible to change the options of a minor while it is attached.
--disk-size size
If you use the size parameter in drbd.conf, we strongly recommend to add an explicit unit postfix. drbdadm and drbdsetup used to have mismatching default units.
--on-io-error err_handler
--fencing fencing_policy
Valid fencing policies are:
dont-care
resource-only
resource-and-stonith
--disk-barrier,
--disk-flushes,
--disk-drain
Since drbd-8.4.2 disk-barrier is disabled by default because since linux-2.6.36 (or 2.6.32 RHEL6) there is no reliable way to determine if queuing of IO-barriers works. Dangerous only enable if you are told so by one that knows for sure.
When selecting the method you should not only base your decision on the measurable performance. In case your backing storage device has a volatile write cache (plain disks, RAID of plain disks) you should use one of the first two. In case your backing storage device has battery-backed write cache you may go with option 3. Option 4 (disable everything, use "none") is dangerous on most IO stacks, may result in write-reordering, and if so, can theoretically be the reason for data corruption, or disturb the DRBD protocol, causing spurious disconnect/reconnect cycles. Do not use no-disk-drain.
Unfortunately device mapper (LVM) might not support barriers.
The letter after "wo:" in /proc/drbd indicates with method is currently in use for a device: b, f, d, n. The implementations:
barrier
flush
drain
none
--md-flushes
--max-bio-bvecs
The best workaround is to proper align the partition within the VM (E.g. start it at sector 1024). That costs 480 KiB of storage. Unfortunately the default of most Linux partitioning tools is to start the first partition at an odd number (63). Therefore most distributions install helpers for virtual linux machines will end up with misaligned partitions. The second best workaround is to limit DRBD's max bvecs per BIO (i.e., the max-bio-bvecs option) to 1, but that might cost performance.
The default value of max-bio-bvecs is 0, which means that there is no user imposed limitation.
--resync-rate rate
--resync-after minor
--al-extents extents
See also drbd.conf(5) and drbdmeta(8) for additional limitations and necessary preparation.
--al-updates {yes | no}
--c-plan-ahead plan_time,
--c-fill-target fill_target,
--c-delay-target delay_target,
--c-max-rate max_rate
By plan_time the agility of the controller is configured. Higher values yield for slower/lower responses of the controller to deviation from the target value. It should be at least 5 times RTT. For regular data paths a fill_target in the area of 4k to 100k is appropriate. For a setup that contains drbd-proxy it is advisable to use delay_target instead. Only when fill_target is set to 0 the controller will use delay_target. 5 times RTT is a reasonable starting value. Max_rate should be set to the bandwidth available between the DRBD-hosts and the machines hosting DRBD-proxy, or to the available disk-bandwidth.
The default value of plan_time is 0, the default unit is 0.1 seconds. Fill_target has 0 and sectors as default unit. Delay_target has 1 (100ms) and 0.1 as default unit. Max_rate has 10240 (100MiB/s) and KiB/s as default unit.
--c-min-rate min_rate
The default value of min_rate is 4M, the default unit is k. If you want to not throttle at all, set it to zero, if you want to throttle always, set it to one.
-t, --disk-timeout disk_timeout
This option is dangerous and may lead to kernel panic!
"Aborting" requests, or force-detaching the disk, is intended for completely blocked/hung local backing devices which do no longer complete requests at all, not even do error completions. In this situation, usually a hard-reset and failover is the only way out.
By "aborting", basically faking a local error-completion, we allow for a more graceful swichover by cleanly migrating services. Still the affected node has to be rebooted "soon".
By completing these requests, we allow the upper layers to re-use the associated data pages.
If later the local backing device "recovers", and now DMAs some data from disk into the original request pages, in the best case it will just put random data into unused pages; but typically it will corrupt meanwhile completely unrelated data, causing all sorts of damage.
Which means delayed successful completion, especially for READ requests, is a reason to panic(). We assume that a delayed *error* completion is OK, though we still will complain noisily about it.
The default value of disk-timeout is 0, which stands for an infinite timeout. Timeouts are specified in units of 0.1 seconds. This option is available since DRBD 8.3.12.
--discard-zeroes-if-aligned {yes | no}
Setting discards-zeroes-if-aligned to yes will allow DRBD to use discards, and to announce discard_zeroes=true, even on backends that announce discard_zeroes_data=false.
We used to ignore the discard_zeroes_data setting completely. To not break established and expected behaviour, the default value is yes.
This option is available since 8.4.7. See also drbd.conf(5).
--disable-write-same {yes | no}
Some disks announce WRITE_SAME support to the kernel but fail with an I/O error upon actually receiving such a request. This mostly happens when using virtualized disks -- notably, this behavior has been observed with VMware's virtual disks.
When disable-write-same is set to yes, WRITE_SAME detection is manually overriden and support is disabled.
The default value of disable-write-same is no. This option is available since 8.4.7.
--read-balancing method
The default value of read-balancing is prefer-local. This option is available since 8.4.1.
--rs-discard-granularity bytes
The value is constrained by the discard granularity of the backing block device. In case rs-discard-granularity is not a multiplier of the discard granularity of the backing block device DRBD rounds it up. The feature only gets active if the backing block device reads back zeroes after a discard command.
The default value of rs-discard-granularity is 0. This option is available since 8.4.7.
The net-options command allows you to change options while the connection is established.
--protocol protocol
Protocol A: write IO is reported as completed, if it has reached local disk and local TCP send buffer.
Protocol B: write IO is reported as completed, if it has reached local disk and remote buffer cache.
Protocol C: write IO is reported as completed, if it has reached both local and remote disk.
--connect-int time
--ping-int time
--timeout val
--sndbuf-size size
--rcvbuf-size size
--ko-count count
--max-epoch-size val
--max-buffers val
See also drbd.conf(5)
--unplug-watermark val
When the number of pending write requests on the standby (secondary) node exceeds the unplug-watermark, we trigger the request processing of our backing storage device. Some storage controllers deliver better performance with small values, others deliver best performance when the value is set to the same value as max-buffers, yet others don't feel much effect at all. Minimum 16, default 128, maximum 131072.
--allow-two-primaries
--cram-hmac-alg alg
--shared-secret secret
--after-sb-0pri asb-0p-policy
disconnect
discard-younger-primary
discard-older-primary
discard-zero-changes
discard-least-changes
discard-node-NODENAME
--after-sb-1pri asb-1p-policy
disconnect
consensus
discard-secondary
call-pri-lost-after-sb
violently-as0p
--after-sb-2pri asb-2p-policy
disconnect
call-pri-lost-after-sb
violently-as0p
--always-asbp
With this option you request that the automatic after-split-brain policies are used as long as the data sets of the nodes are somehow related. This might cause a full sync, if the UUIDs indicate the presence of a third node. (Or double faults have led to strange UUID sets.)
--rr-conflict role-resync-conflict-policy
With the violently setting you allow DRBD to force a primary node into SyncTarget state. This means that the data exposed by DRBD changes to the SyncSource's version of the data instantaneously. USE THIS OPTION ONLY IF YOU KNOW WHAT YOU ARE DOING.
--data-integrity-alg hash_alg
See also the notes on data integrity on the drbd.conf manpage.
--no-tcp-cork
--ping-timeout ping_timeout
--discard-my-data
--tentative
--on-congestion congestion_policy,
--congestion-fill fill_threshold,
--congestion-extents active_extents_threshold
When DRBD is deployed with DRBD-proxy it might be more desirable that DRBD goes into AHEAD/BEHIND mode shortly before the send queue becomes full. In AHEAD/BEHIND mode DRBD does no longer replicate data, but still keeps the connection open.
The advantage of the AHEAD/BEHIND mode is that the application is not slowed down, even if DRBD-proxy's buffer is not sufficient to buffer all write requests. The downside is that the peer node falls behind, and that a resync will be necessary to bring it back into sync. During that resync the peer node will have an inconsistent disk.
Available congestion_policys are block and pull-ahead. The default is block. Fill_threshold might be in the range of 0 to 10GiBytes. The default is 0 which disables the check. Active_extents_threshold has the same limits as al-extents.
The AHEAD/BEHIND mode and its settings are available since DRBD 8.3.10.
--verify-alg hash-alg
See also the notes on data integrity on the drbd.conf manpage.
--csums-alg hash-alg
This setting is useful for DRBD setups with low bandwidth links. During the restart of a crashed primary node, all blocks covered by the activity log are marked for resync. But a large part of those will actually be still in sync, therefore using csums-alg will lower the required bandwidth in exchange for CPU cycles.
--use-rle
Because the bitmap typically contains compact areas where all bits are unset (clean) or set (dirty), a simple run-length encoding scheme can considerably reduce the network traffic necessary for the bitmap exchange.
For backward compatibility reasons, and because on fast links this possibly does not improve transfer time but consumes cpu cycles, this defaults to off.
Introduced in 8.3.2.
--socket-check-timeout
In such setups socket-check-timeout should be set to at least to the round trip time between DRBD and DRBD-proxy. I.e. in most cases to 1.
The default unit is tenths of a second, the default value is 0 (which causes DRBD to use the value of ping-timeout instead). Introduced in 8.4.5.
--cpu-mask cpu-mask
--on-no-data-accessible ond-policy
If ond-policy is set to suspend-io you can either resume IO by attaching/connecting the last lost data storage, or by the drbdadm resume-io res command. The latter will result in IO errors of course.
The default is io-error. This setting is available since DRBD 8.3.9.
Normally it is not possible to set both devices of a connected DRBD device pair to primary role. By using the --allow-two-primaries option, you override this behavior and instruct DRBD to allow two primaries.
--overwrite-data-of-peer
--force
It is possible that both devices of a connected DRBD device pair are secondary.
If on-line verification is already in progress (and this node is "VerifyS"), this command silently "succeeds". In this case, any start-sector (see below) will be ignored, and any stop-sector (see below) will be honored. This can be used to stop a running verify, or to update/shorten/extend the coverage of the currently running verify.
This command will fail if the device is not part of a connected device pair.
See also the notes on data integrity on the drbd.conf manpage.
--start start-sector
Default unit is sectors. You may also specify a unit explicitly. The start-sector will be rounded down to a multiple of 8 sectors (4kB).
-S, --stop stop-sector
Default unit is sectors. You may also specify a unit explicitly. The stop-sector may be updated by issuing an additional drbdsetup verify command on the same node while the verify is running. This can be used to stop a running verify, or to update/shorten/extend the coverage of the currently running verify.
This command will fail if the device is not either part of a connected device pair, or disconnected Secondary.
On a disconnected Primary device, this will set all bits in the out of sync bitmap. As a side affect this suspends updates to the on disk activity log. Updates to the on disk activity log resume automatically when necessary.
--wfc-timeout wfc_timeout,
--degr-wfc-timeout degr_wfc_timeout,
--outdated-wfc-timeout outdated_wfc_timeout,
--wait-after-sb
-f, --force
On the other hand A forced detach returns immediately. It allows you to detach DRBD from a frozen backing block device. Please note that the disk will be marked as failed until all pending IO requests where finished by the backing block device.
The --size option can be used to online shrink the usable size of a drbd device. It's the users responsibility to make sure that a file system on the device is not truncated by that operation.
The --assume-peer-has-space allows you to resize a device which is currently not connected to the peer. Use with care, since if you do not resize the peer's disk as well, further connect attempts of the two will fail.
When the --assume-clean option is given DRBD will skip the resync of the new storage. Only do this if you know that the new storage was initialized to the same content by other means.
The options --al-stripes and --al-stripe-size-kB may be used to change the layout of the activity log online. In case of internal meta data this may invovle shrinking the user visible size at the same time (unsing the --size) or increasing the avalable space on the backing devices.
This command is called by drbdadm resize res after drbdsetup device resize returned.
--show-defaults
Long lines are wrapped around at terminal width, and indented to indicate how the lines belongs together. Available options:
--verbose
--statistics
--color={always | auto | never}
For example, the non-verbose output for a resource with only one connection and only one volume could look like this:
fs-backoffice role:Primary disk:UpToDate peer role:Secondary replication:Established peer-disk:UpToDate
With the --verbose --statistics options, the same resource could be reported as:
fs-data role:Primary suspended:no write-ordering:drain volume:0 minor:1 disk:UpToDate size:10616472 read:134465 written:144800 al-writes:18 bm-writes:0 upper-pending:0 lower-pending:0 al-suspended:no blocked:no peer connection:Connected role:Secondary congested:no volume:0 replication:Established peer-disk:UpToDate resync-suspended:no received:122596 sent:22204 out-of-sync:0 pending:0 unacked:0
The output format is meant to be human as well as machine readable. Each line starts with the event number, which is followed by an asterisk if the event continues in the next line. The second word in each line indicates the kind of event: exists for an existing object; create, destroy, and change if an object is created, destroyed, or changed; or call or response if an event handler is called or it returns. The third word indicates the object the event applies to: resource, device, connection, peer-device, helper, or a dash (-) to indicate that the current state has been dumped completely.
The remaining words identify the object and describe the state that he object is in. Available options:
--now
--statistics
Displays every state change of DRBD and all calls to helper programs. This might be used to get notified of DRBD's state changes by piping the output to another program.
--all-devices
--unfiltered
Available option:
--clear-bitmap
This can be used to skip the initial sync, if you want to start from scratch. This use-case does only work on "Just Created" meta data. Necessary steps:
drbdadm -- --force create-md res
drbdadm up res
drbdadm new-current-uuid --clear-bitmap res
drbdadm primary res
mkfs -t fs-type $(drbdadm sh-dev res)
One obvious side-effect is that the replica is full of old garbage (unless you made them identical using other means), so any online-verify is expected to find any number of out-of-sync blocks.
You must not use this on pre-existing data! Even though it may appear to work at first glance, once you switch to the other node, your data is toast, as it never got replicated. So do not leave out the mkfs (or equivalent).
This can also be used to shorten the initial resync of a cluster where the second node is added after the first node is gone into production, by means of disk shipping. This use-case works on disconnected devices only, the device may be in primary or secondary role.
The necessary steps on the current active server are:
Now add the disk to the new secondary node, and join it to the cluster. You will get a resync of that parts that were changed since the first call to drbdsetup in step 1.
6 May 2011 | DRBD 8.4.0 |