drbdsetup attach minor lower_dev meta_data_dev
meta_data_index,
drbdsetup disk-options minor
The
attach command attaches a lower-level device
to an existing replicated device. The
disk-options command changes the
disk options of an attached lower-level device. In either case, the replicated
device must have been created with
drbdsetup new-minor.
Both commands refer to the replicated device by its minor
number. lower_dev is the name of the lower-level device.
meta_data_dev is the name of the device containing the metadata, and
may be the same as lower_dev. meta_data_index is either a
numeric metadata index, or the keyword internal for internal
metadata, or the keyword flexible for variable-size external
metadata. Available options:
--al-extents extents
DRBD automatically maintains a "hot" or
"active" disk area likely to be written to again soon based on the
recent write activity. The "active" disk area can be written to
immediately, while "inactive" disk areas must be
"activated" first, which requires a meta-data write. We also refer
to this active disk area as the "activity log".
The activity log saves meta-data writes, but the whole log must be
resynced upon recovery of a failed node. The size of the activity log is a
major factor of how long a resync will take and how fast a replicated disk
will become consistent after a crash.
The activity log consists of a number of 4-Megabyte segments; the
al-extents parameter determines how many of those segments can be
active at the same time. The default value for al-extents is 1237,
with a minimum of 7 and a maximum of 65536.
Note that the effective maximum may be smaller, depending on how
you created the device meta data, see also drbdmeta(8) The effective
maximum is 919 * (available on-disk activity-log ring-buffer area/4kB -1),
the default 32kB ring-buffer effects a maximum of 6433 (covers more than 25
GiB of data) We recommend to keep this well within the amount your backend
storage and replication link are able to resync inside of about 5
minutes.
--al-updates {yes | no}
With this parameter, the activity log can be turned off
entirely (see the al-extents parameter). This will speed up writes
because fewer meta-data writes will be necessary, but the entire device needs
to be resynchronized opon recovery of a failed primary node. The default value
for al-updates is yes.
--disk-barrier,
--disk-flushes,
--disk-drain
DRBD has three methods of handling the ordering of
dependent write requests:
disk-barrier
Use disk barriers to make sure that requests are written
to disk in the right order. Barriers ensure that all requests submitted before
a barrier make it to the disk before any requests submitted after the barrier.
This is implemented using 'tagged command queuing' on SCSI devices and 'native
command queuing' on SATA devices. Only some devices and device stacks support
this method. The device mapper (LVM) only supports barriers in some
configurations.
Note that on systems which do not support disk barriers, enabling
this option can lead to data loss or corruption. Until DRBD 8.4.1,
disk-barrier was turned on if the I/O stack below DRBD did support
barriers. Kernels since linux-2.6.36 (or 2.6.32 RHEL6) no longer allow to
detect if barriers are supported. Since drbd-8.4.2, this option is off by
default and needs to be enabled explicitly.
disk-flushes
Use disk flushes between dependent write requests, also
referred to as 'force unit access' by drive vendors. This forces all data to
disk. This option is enabled by default.
disk-drain
Wait for the request queue to "drain" (that is,
wait for the requests to finish) before submitting a dependent write request.
This method requires that requests are stable on disk when they finish. Before
DRBD 8.0.9, this was the only method implemented. This option is enabled by
default. Do not disable in production environments.
From these three methods, drbd will use the first that is enabled
and supported by the backing storage device. If all three of these options
are turned off, DRBD will submit write requests without bothering about
dependencies. Depending on the I/O stack, write requests can be reordered,
and they can be submitted in a different order on different cluster nodes.
This can result in data loss or corruption. Therefore, turning off all three
methods of controlling write ordering is strongly discouraged.
A general guideline for configuring write ordering is to use disk
barriers or disk flushes when using ordinary disks (or an ordinary disk
array) with a volatile write cache. On storage without cache or with a
battery backed write cache, disk draining can be a reasonable choice.
--disk-timeout
If the lower-level device on which a DRBD device stores
its data does not finish an I/O request within the defined
disk-timeout, DRBD treats this as a failure. The lower-level device is
detached, and the device's disk state advances to Diskless. If DRBD is
connected to one or more peers, the failed request is passed on to one of
them.
This option is dangerous and may lead to kernel panic!
"Aborting" requests, or force-detaching the disk, is
intended for completely blocked/hung local backing devices which do no
longer complete requests at all, not even do error completions. In this
situation, usually a hard-reset and failover is the only way out.
By "aborting", basically faking a local
error-completion, we allow for a more graceful swichover by cleanly
migrating services. Still the affected node has to be rebooted
"soon".
By completing these requests, we allow the upper layers to re-use
the associated data pages.
If later the local backing device "recovers", and now
DMAs some data from disk into the original request pages, in the best case
it will just put random data into unused pages; but typically it will
corrupt meanwhile completely unrelated data, causing all sorts of
damage.
Which means delayed successful completion, especially for READ
requests, is a reason to panic(). We assume that a delayed *error*
completion is OK, though we still will complain noisily about it.
The default value of disk-timeout is 0, which stands for an
infinite timeout. Timeouts are specified in units of 0.1 seconds. This
option is available since DRBD 8.3.12.
--md-flushes
Enable disk flushes and disk barriers on the meta-data
device. This option is enabled by default. See the disk-flushes
parameter.
--on-io-error handler
Configure how DRBD reacts to I/O errors on a lower-level
device. The following policies are defined:
pass_on
Change the disk status to Inconsistent, mark the failed
block as inconsistent in the bitmap, and retry the I/O operation on a remote
cluster node.
call-local-io-error
Call the local-io-error handler (see the
handlers section).
detach
Detach the lower-level device and continue in diskless
mode.
--read-balancing policy
Distribute read requests among cluster nodes as defined
by
policy. The supported policies are
prefer-local (the
default),
prefer-remote,
round-robin,
least-pending,
when-congested-remote,
32K-striping,
64K-striping,
128K-striping,
256K-striping,
512K-striping and
1M-striping.
This option is available since DRBD 8.4.1.
resync-after minor
Define that a device should only resynchronize after the
specified other device. By default, no order between devices is defined, and
all devices will resynchronize in parallel. Depending on the configuration of
the lower-level devices, and the available network and disk bandwidth, this
can slow down the overall resync process. This option can be used to form a
chain or tree of dependencies among devices.
--size size
Specify the size of the lower-level device explicitly
instead of determining it automatically. The device size must be determined
once and is remembered for the lifetime of the device. In order to determine
it automatically, all the lower-level devices on all nodes must be attached,
and all nodes must be connected. If the size is specified explicitly, this is
not necessary. The size value is assumed to be in units of sectors (512
bytes) by default.
--discard-zeroes-if-aligned {yes | no}
There are several aspects to discard/trim/unmap support
on linux block devices. Even if discard is supported in general, it may fail
silently, or may partially ignore discard requests. Devices also announce
whether reading from unmapped blocks returns defined data (usually zeroes), or
undefined data (possibly old data, possibly garbage).
If on different nodes, DRBD is backed by devices with differing
discard characteristics, discards may lead to data divergence (old data or
garbage left over on one backend, zeroes due to unmapped areas on the other
backend). Online verify would now potentially report tons of spurious
differences. While probably harmless for most use cases (fstrim on a file
system), DRBD cannot have that.
To play safe, we have to disable discard support, if our local
backend (on a Primary) does not support
"discard_zeroes_data=true". We also have to translate discards to
explicit zero-out on the receiving side, unless the receiving side
(Secondary) supports "discard_zeroes_data=true", thereby
allocating areas what were supposed to be unmapped.
There are some devices (notably the LVM/DM thin provisioning) that
are capable of discard, but announce discard_zeroes_data=false. In the case
of DM-thin, discards aligned to the chunk size will be unmapped, and reading
from unmapped sectors will return zeroes. However, unaligned partial head or
tail areas of discard requests will be silently ignored.
If we now add a helper to explicitly zero-out these unaligned
partial areas, while passing on the discard of the aligned full chunks, we
effectively achieve discard_zeroes_data=true on such devices.
Setting discard-zeroes-if-aligned to yes will allow
DRBD to use discards, and to announce discard_zeroes_data=true, even on
backends that announce discard_zeroes_data=false.
Setting discard-zeroes-if-aligned to no will cause
DRBD to always fall-back to zero-out on the receiving side, and to not even
announce discard capabilities on the Primary, if the respective backend
announces discard_zeroes_data=false.
We used to ignore the discard_zeroes_data setting completely. To
not break established and expected behaviour, and suddenly cause fstrim on
thin-provisioned LVs to run out-of-space instead of freeing up space, the
default value is yes.
This option is available since 8.4.7.
--disable-write-same {yes | no}
Some disks announce WRITE_SAME support to the kernel but
fail with an I/O error upon actually receiving such a request. This mostly
happens when using virtualized disks -- notably, this behavior has been
observed with VMware's virtual disks.
When disable-write-same is set to yes, WRITE_SAME
detection is manually overriden and support is disabled.
The default value of disable-write-same is no. This
option is available since 8.4.7.
--rs-discard-granularity byte
When
rs-discard-granularity is set to a non zero,
positive value then DRBD tries to do a resync operation in requests of this
size. In case such a block contains only zero bytes on the sync source node,
the sync target node will issue a discard/trim/unmap command for the area.
The value is constrained by the discard granularity of the backing
block device. In case rs-discard-granularity is not a multiplier of
the discard granularity of the backing block device DRBD rounds it up. The
feature only gets active if the backing block device reads back zeroes after
a discard command.
The default value of rs-discard-granularity is 0. This
option is available since 8.4.7.
drbdsetup peer-device-options resource
peer_node_id volume
These are options that affect the
peer's device.
--c-delay-target delay_target,
--c-fill-target fill_target,
--c-max-rate max_rate,
--c-plan-ahead plan_time
Dynamically control the resync speed. The following modes
are available:
•Dynamic control with fill target (default).
Enabled when c-plan-ahead is non-zero and c-fill-target is
non-zero. The goal is to fill the buffers along the data path with a defined
amount of data. This mode is recommended when DRBD-proxy is used. Configured
with c-plan-ahead, c-fill-target and c-max-rate.
•Dynamic control with delay target. Enabled when
c-plan-ahead is non-zero (default) and c-fill-target is zero.
The goal is to have a defined delay along the path. Configured with
c-plan-ahead, c-delay-target and c-max-rate.
•Fixed resync rate. Enabled when
c-plan-ahead is zero. DRBD will try to perform resync I/O at a fixed
rate. Configured with resync-rate.
The c-plan-ahead parameter defines how fast DRBD adapts to
changes in the resync speed. It should be set to five times the network
round-trip time or more. The default value of c-plan-ahead is 20, in
units of 0.1 seconds.
The c-fill-target parameter defines the how much resync
data DRBD should aim to have in-flight at all times. Common values for
"normal" data paths range from 4K to 100K. The default value of
c-fill-target is 100, in units of sectors
The c-delay-target parameter defines the delay in the
resync path that DRBD should aim for. This should be set to five times the
network round-trip time or more. The default value of c-delay-target
is 10, in units of 0.1 seconds.
The c-max-rate parameter limits the maximum bandwidth used
by dynamically controlled resyncs. Setting this to zero removes the
limitation (since DRBD 9.0.28). It should be set to either the bandwidth
available between the DRBD hosts and the machines hosting DRBD-proxy, or to
the available disk bandwidth. The default value of c-max-rate is
102400, in units of KiB/s.
Dynamic resync speed control is available since DRBD 8.3.9.
--c-min-rate min_rate
A node which is primary and sync-source has to schedule
application I/O requests and resync I/O requests. The
c-min-rate
parameter limits how much bandwidth is available for resync I/O; the remaining
bandwidth is used for application I/O.
A c-min-rate value of 0 means that there is no limit on the
resync I/O bandwidth. This can slow down application I/O significantly. Use
a value of 1 (1 KiB/s) for the lowest possible resync rate.
The default value of c-min-rate is 250, in units of
KiB/s.
--resync-rate rate
Define how much bandwidth DRBD may use for
resynchronizing. DRBD allows "normal" application I/O even during a
resync. If the resync takes up too much bandwidth, application I/O can become
very slow. This parameter allows to avoid that. Please note this is option
only works when the dynamic resync controller is disabled.
drbdsetup check-resize minor
Remember the current size of the lower-level device of
the specified replicated device. Used by drbdadm. The size information is
stored in file /var/lib/drbd/drbd-minor-minor.lkbd.
drbdsetup new-peer resource peer_node_id,
drbdsetup net-options resource peer_node_id
The
new-peer command creates a connection within a
resource. The resource must have been created with
drbdsetup
new-resource. The
net-options command changes the network options
of an existing connection. Before a connection can be activated with the
connect command, at least one path need to added with the
new-path command. Available options:
--after-sb-0pri policy
Define how to react if a split-brain scenario is detected
and none of the two nodes is in primary role. (We detect split-brain scenarios
when two nodes connect; split-brain decisions are always between two nodes.)
The defined policies are:
disconnect
No automatic resynchronization; simply disconnect.
discard-younger-primary,
discard-older-primary
Resynchronize from the node which became primary first
(discard-younger-primary) or last (discard-older-primary). If
both nodes became primary independently, the discard-least-changes
policy is used.
discard-zero-changes
If only one of the nodes wrote data since the split brain
situation was detected, resynchronize from this node to the other. If both
nodes wrote data, disconnect.
discard-least-changes
Resynchronize from the node with more modified
blocks.
discard-node-nodename
Always resynchronize to the named node.
--after-sb-1pri policy
Define how to react if a split-brain scenario is
detected, with one node in primary role and one node in secondary role. (We
detect split-brain scenarios when two nodes connect, so split-brain decisions
are always among two nodes.) The defined policies are:
disconnect
No automatic resynchronization, simply disconnect.
consensus
Discard the data on the secondary node if the
after-sb-0pri algorithm would also discard the data on the secondary
node. Otherwise, disconnect.
violently-as0p
Always take the decision of the after-sb-0pri
algorithm, even if it causes an erratic change of the primary's view of the
data. This is only useful if a single-node file system (i.e., not OCFS2 or
GFS) with the allow-two-primaries flag is used. This option can cause
the primary node to crash, and should not be used.
discard-secondary
Discard the data on the secondary node.
call-pri-lost-after-sb
Always take the decision of the after-sb-0pri
algorithm. If the decision is to discard the data on the primary node, call
the pri-lost-after-sb handler on the primary node.
--after-sb-2pri policy
Define how to react if a split-brain scenario is detected
and both nodes are in primary role. (We detect split-brain scenarios when two
nodes connect, so split-brain decisions are always among two nodes.) The
defined policies are:
disconnect
No automatic resynchronization, simply disconnect.
violently-as0p
See the violently-as0p policy for
after-sb-1pri.
call-pri-lost-after-sb
Call the pri-lost-after-sb helper program on one
of the machines unless that machine can demote to secondary. The helper
program is expected to reboot the machine, which brings the node into a
secondary role. Which machine runs the helper program is determined by the
after-sb-0pri strategy.
--allow-two-primaries
The most common way to configure DRBD devices is to allow
only one node to be primary (and thus writable) at a time.
In some scenarios it is preferable to allow two nodes to be
primary at once; a mechanism outside of DRBD then must make sure that writes
to the shared, replicated device happen in a coordinated way. This can be
done with a shared-storage cluster file system like OCFS2 and GFS, or with
virtual machine images and a virtual machine manager that can migrate
virtual machines between physical machines.
The allow-two-primaries parameter tells DRBD to allow two
nodes to be primary at the same time. Never enable this option when using a
non-distributed file system; otherwise, data corruption and node crashes
will result!
--always-asbp
Normally the automatic after-split-brain policies are
only used if current states of the UUIDs do not indicate the presence of a
third node.
With this option you request that the automatic after-split-brain
policies are used as long as the data sets of the nodes are somehow related.
This might cause a full sync, if the UUIDs indicate the presence of a third
node. (Or double faults led to strange UUID sets.)
--connect-int time
As soon as a connection between two nodes is configured
with drbdsetup connect, DRBD immediately tries to establish the
connection. If this fails, DRBD waits for connect-int seconds and then
repeats. The default value of connect-int is 10 seconds.
--cram-hmac-alg hash-algorithm
Configure the hash-based message authentication code
(HMAC) or secure hash algorithm to use for peer authentication. The kernel
supports a number of different algorithms, some of which may be loadable as
kernel modules. See the shash algorithms listed in /proc/crypto. By default,
cram-hmac-alg is unset. Peer authentication also requires a
shared-secret to be configured.
--csums-alg hash-algorithm
Normally, when two nodes resynchronize, the sync target
requests a piece of out-of-sync data from the sync source, and the sync source
sends the data. With many usage patterns, a significant number of those blocks
will actually be identical.
When a csums-alg algorithm is specified, when requesting a
piece of out-of-sync data, the sync target also sends along a hash of the
data it currently has. The sync source compares this hash with its own
version of the data. It sends the sync target the new data if the hashes
differ, and tells it that the data are the same otherwise. This reduces the
network bandwidth required, at the cost of higher cpu utilization and
possibly increased I/O on the sync target.
The csums-alg can be set to one of the secure hash
algorithms supported by the kernel; see the shash algorithms listed in
/proc/crypto. By default, csums-alg is unset.
--csums-after-crash-only
Enabling this option (and csums-alg, above) makes it
possible to use the checksum based resync only for the first resync after
primary crash, but not for later "network hickups".
In most cases, block that are marked as need-to-be-resynced are in
fact changed, so calculating checksums, and both reading and writing the
blocks on the resync target is all effective overhead.
The advantage of checksum based resync is mostly after primary
crash recovery, where the recovery marked larger areas (those covered by the
activity log) as need-to-be-resynced, just in case. Introduced in 8.4.5.
--data-integrity-alg alg
DRBD normally relies on the data integrity checks built
into the TCP/IP protocol, but if a data integrity algorithm is configured, it
will additionally use this algorithm to make sure that the data received over
the network match what the sender has sent. If a data integrity error is
detected, DRBD will close the network connection and reconnect, which will
trigger a resync.
The data-integrity-alg can be set to one of the secure hash
algorithms supported by the kernel; see the shash algorithms listed in
/proc/crypto. By default, this mechanism is turned off.
Because of the CPU overhead involved, we recommend not to use this
option in production environments. Also see the notes on data integrity
below.
--fencing fencing_policy
Fencing is a preventive measure to avoid
situations where both nodes are primary and disconnected. This is also known
as a split-brain situation. DRBD supports the following fencing policies:
dont-care
No fencing actions are taken. This is the default
policy.
resource-only
If a node becomes a disconnected primary, it tries to
fence the peer. This is done by calling the fence-peer handler. The
handler is supposed to reach the peer over an alternative communication path
and call 'drbdadm outdate minor' there.
resource-and-stonith
If a node becomes a disconnected primary, it freezes all
its IO operations and calls its fence-peer handler. The fence-peer handler is
supposed to reach the peer over an alternative communication path and call
'drbdadm outdate minor' there. In case it cannot do that, it should
stonith the peer. IO is resumed as soon as the situation is resolved. In case
the fence-peer handler fails, I/O can be resumed manually with 'drbdadm
resume-io'.
--ko-count number
If a secondary node fails to complete a write request in
ko-count times the timeout parameter, it is excluded from the
cluster. The primary node then sets the connection to this secondary node to
Standalone. To disable this feature, you should explicitly set it to 0;
defaults may change between versions.
--max-buffers number
Limits the memory usage per DRBD minor device on the
receiving side, or for internal buffers during resync or online-verify. Unit
is PAGE_SIZE, which is 4 KiB on most systems. The minimum possible setting is
hard coded to 32 (=128 KiB). These buffers are used to hold data blocks while
they are written to/read from disk. To avoid possible distributed deadlocks on
congestion, this setting is used as a throttle threshold rather than a hard
limit. Once more than max-buffers pages are in use, further allocation from
this pool is throttled. You want to increase max-buffers if you cannot
saturate the IO backend on the receiving side.
--max-epoch-size number
Define the maximum number of write requests DRBD may
issue before issuing a write barrier. The default value is 2048, with a
minimum of 1 and a maximum of 20000. Setting this parameter to a value below
10 is likely to decrease performance.
--on-congestion policy,
--congestion-fill threshold,
--congestion-extents threshold
By default, DRBD blocks when the TCP send queue is full.
This prevents applications from generating further write requests until more
buffer space becomes available again.
When DRBD is used together with DRBD-proxy, it can be better to
use the pull-ahead on-congestion policy, which can switch DRBD
into ahead/behind mode before the send queue is full. DRBD then records the
differences between itself and the peer in its bitmap, but it no longer
replicates them to the peer. When enough buffer space becomes available
again, the node resynchronizes with the peer and switches back to normal
replication.
This has the advantage of not blocking application I/O even when
the queues fill up, and the disadvantage that peer nodes can fall behind
much further. Also, while resynchronizing, peer nodes will become
inconsistent.
The available congestion policies are block (the default)
and pull-ahead. The congestion-fill parameter defines how much
data is allowed to be "in flight" in this connection. The default
value is 0, which disables this mechanism of congestion control, with a
maximum of 10 GiBytes. The congestion-extents parameter defines how
many bitmap extents may be active before switching into ahead/behind mode,
with the same default and limits as the al-extents parameter. The
congestion-extents parameter is effective only when set to a value
smaller than al-extents.
Ahead/behind mode is available since DRBD 8.3.10.
--ping-int interval
When the TCP/IP connection to a peer is idle for more
than ping-int seconds, DRBD will send a keep-alive packet to make sure
that a failed peer or network connection is detected reasonably soon. The
default value is 10 seconds, with a minimum of 1 and a maximum of 120 seconds.
The unit is seconds.
--ping-timeout timeout
Define the timeout for replies to keep-alive packets. If
the peer does not reply within ping-timeout, DRBD will close and try to
reestablish the connection. The default value is 0.5 seconds, with a minimum
of 0.1 seconds and a maximum of 30 seconds. The unit is tenths of a
second.
--socket-check-timeout timeout
In setups involving a DRBD-proxy and connections that
experience a lot of buffer-bloat it might be necessary to set
ping-timeout to an unusual high value. By default DRBD uses the same
value to wait if a newly established TCP-connection is stable. Since the
DRBD-proxy is usually located in the same data center such a long wait time
may hinder DRBD's connect process.
In such setups socket-check-timeout should be set to at
least to the round trip time between DRBD and DRBD-proxy. I.e. in most cases
to 1.
The default unit is tenths of a second, the default value is 0
(which causes DRBD to use the value of ping-timeout instead).
Introduced in 8.4.5.
--protocol name
Use the specified protocol on this connection. The
supported protocols are:
A
Writes to the DRBD device complete as soon as they have
reached the local disk and the TCP/IP send buffer.
B
Writes to the DRBD device complete as soon as they have
reached the local disk, and all peers have acknowledged the receipt of the
write requests.
C
Writes to the DRBD device complete as soon as they have
reached the local and all remote disks.
--rcvbuf-size size
Configure the size of the TCP/IP receive buffer. A value
of 0 (the default) causes the buffer size to adjust dynamically. This
parameter usually does not need to be set, but it can be set to a value up to
10 MiB. The default unit is bytes.
--rr-conflict policy
This option helps to solve the cases when the outcome of
the resync decision is incompatible with the current role assignment in the
cluster. The defined policies are:
disconnect
No automatic resynchronization, simply disconnect.
retry-connect
Disconnect now, and retry to connect immediatly
afterwards.
violently
Resync to the primary node is allowed, violating the
assumption that data on a block device are stable for one of the nodes. Do
not use this option, it is dangerous.
call-pri-lost
Call the pri-lost handler on one of the machines.
The handler is expected to reboot the machine, which puts it into secondary
role.
--shared-secret secret
Configure the shared secret used for peer authentication.
The secret is a string of up to 64 characters. Peer authentication also
requires the cram-hmac-alg parameter to be set.
--sndbuf-size size
Configure the size of the TCP/IP send buffer. Since DRBD
8.0.13 / 8.2.7, a value of 0 (the default) causes the buffer size to adjust
dynamically. Values below 32 KiB are harmful to the throughput on this
connection. Large buffer sizes can be useful especially when protocol A is
used over high-latency networks; the maximum value supported is 10 MiB.
--tcp-cork
By default, DRBD uses the TCP_CORK socket option to
prevent the kernel from sending partial messages; this results in fewer and
bigger packets on the network. Some network stacks can perform worse with this
optimization. On these, the tcp-cork parameter can be used to turn this
optimization off.
--timeout time
Define the timeout for replies over the network: if a
peer node does not send an expected reply within the specified timeout,
it is considered dead and the TCP/IP connection is closed. The timeout value
must be lower than connect-int and lower than ping-int. The
default is 6 seconds; the value is specified in tenths of a second.
--use-rle
Each replicated device on a cluster node has a separate
bitmap for each of its peer devices. The bitmaps are used for tracking the
differences between the local and peer device: depending on the cluster state,
a disk range can be marked as different from the peer in the device's bitmap,
in the peer device's bitmap, or in both bitmaps. When two cluster nodes
connect, they exchange each other's bitmaps, and they each compute the union
of the local and peer bitmap to determine the overall differences.
Bitmaps of very large devices are also relatively large, but they
usually compress very well using run-length encoding. This can save time and
bandwidth for the bitmap transfers.
The use-rle parameter determines if run-length encoding
should be used. It is on by default since DRBD 8.4.0.
--verify-alg hash-algorithm
Online verification (
drbdadm verify) computes and
compares checksums of disk blocks (i.e., hash values) in order to detect if
they differ. The
verify-alg parameter determines which algorithm to use
for these checksums. It must be set to one of the secure hash algorithms
supported by the kernel before online verify can be used; see the shash
algorithms listed in /proc/crypto.
We recommend to schedule online verifications regularly during
low-load periods, for example once a month. Also see the notes on data
integrity below.
drbdsetup new-path resource peer_node_id
local-addr remote-addr
The new-path command creates a path within a
connection. The connection must have been created with drbdsetup
new-peer. Local_addr and remote_addr refer to the local and
remote protocol, network address, and port in the format
[address-family:]address[:port]. The address families
ipv4, ipv6, ssocks (Dolphin Interconnect Solutions'
"super sockets"), sdp (Infiniband Sockets Direct Protocol),
and sci are supported (sci is an alias for ssocks). If no
address family is specified, ipv4 is assumed. For all address families
except ipv6, the address uses IPv4 address notation (for
example, 1.2.3.4). For ipv6, the address is enclosed in brackets and
uses IPv6 address notation (for example, [fd01:2345:6789:abcd::1]). The
port defaults to 7788.
drbdsetup connect resource peer_node_id
The
connect command activates a connection. That
means that the DRBD driver will bind and listen on all local addresses of the
connection-'s paths. It will begin to try to establish one or more paths of
the connection. Available options:
--tentative
Only determine if a connection to the peer can be
established and if a resync is necessary (and in which direction) without
actually establishing the connection or starting the resync. Check the system
log to see what DRBD would do without the --tentative option.
--discard-my-data
Discard the local data and resynchronize with the peer
that has the most up-to-data data. Use this option to manually recover from a
split-brain situation.
drbdsetup del-peer resource peer_node_id
The del-peer command removes a connection from a
resource.
drbdsetup del-path resource peer_node_id
local-addr remote-addr
The del-path command removes a path from a
connection. Please note that it fails if the path is necessary to keep
a connected connection in tact. In order to remove all paths, disconnect the
connection first.
drbdsetup cstate resource peer_node_id
Show the current state of a connection. The connection is
identified by the node-id of the peer; see the drbdsetup connect
command.
drbdsetup del-minor minor
Remove a replicated device. No lower-level device may be
attached; see drbdsetup detach.
drbdsetup del-resource resource
Remove a resource. All volumes and connections must be
removed first (drbdsetup del-minor, drbdsetup disconnect).
Alternatively, drbdsetup down can be used to remove a resource together
with all its volumes and connections.
drbdsetup detach minor
Detach the lower-level device of a replicated device.
Available options:
--force
Force the detach and return immediately. This puts the
lower-level device into failed state until all pending I/O has completed, and
then detaches the device. Any I/O not yet submitted to the lower-level device
(for example, because I/O on the device was suspended) is assumed to have
failed.
drbdsetup disconnect resource
peer_node_id
Remove a connection to a peer host. The connection is
identified by the node-id of the peer; see the drbdsetup connect
command.
drbdsetup down {resource | all}
Take a resource down by removing all volumes,
connections, and the resource itself.
drbdsetup dstate minor
Show the current disk state of a lower-level
device.
drbdsetup events2 {resource | all}
Show the current state of all configured DRBD objects,
followed by all changes to the state.
The output format is meant to be human as well as machine
readable. The line starts with a word that indicates the kind of event:
exists for an existing object; create, destroy, and
change if an object is created, destroyed, or changed; call or
response if an event handler is called or it returns; or
rename when the name of an object is changed. The second word
indicates the object the event applies to: resource, device,
connection, peer-device, path, helper, or a dash
(-) to indicate that the current state has been dumped
completely.
The remaining words identify the object and describe the state
that the object is in. Some special keys are worth mentioning:
resource may_promote:{yes|no}
Whether promoting to primary is expected to succeed. When
quorum is enabled, this can be used to trigger failover. When
may_promote:yes is reported on this node, then no writes are possible
on any other node, which generally means that the application can be started
on this node, even when it has been running on another.
resource promotion_score:score
An integer heuristic indicating the relative preference
for promoting this resource. A higher score is better in terms of having local
disks and having access to up-to-date data. The score may be positive even
when some node is primary. It will be zero when promotion is impossible due to
quorum or lack of any access to up-to-date data.
Available options:
--now
Terminate after reporting the current state. The default
is to continuously listen and report state changes.
--poll
Read from stdin and update when
n is read.
Newlines are ignored. Every other input terminates the command.
Without --now, changes are printed as usual. On each
n the current state is fetched, but only changed objects are printed.
This is useful with --statistics or --full because DRBD does
not otherwise send updates when only the statistics change.
In combination with --now the full state is printed on each
n. No other changes are printed.
--statistics
Include statistics in the output.
--diff
Write information in form of a diff between old and new
state. This helps simple tools to avoid (old) state tracking on their
own.
--full
Write complete state information, especially on change
events. This enables --statistics and --verbose.
drbdsetup get-gi resource peer_node_id
volume
Show the data generation identifiers for a device on a
particular connection. The device is identified by its volume number. The
connection is identified by its endpoints; see the
drbdsetup connect
command.
The output consists of the current UUID, bitmap UUID, and the
first two history UUIDS, folowed by a set of flags. The current UUID and
history UUIDs are device specific; the bitmap UUID and flags are peer device
specific. This command only shows the first two history UUIDs. Internally,
DRBD maintains one history UUID for each possible peer device.
drbdsetup invalidate minor
Replace the local data of a device with that of a peer.
All the local data will be marked out-of-sync, and a resync with the specified
peer device will be initialted.
Available options:
--reset-bitmap=no
Usually an invalidate operation sets all bits in the
bitmap to out-of-sync before beginning the resync from the peer. By giving
--reset-bitmap=no DRBD will use the bitmap as it is. Usually this is
used after an online verify operation found differences in the backing
devices.
The --reset-bitmap option is available since DRBD kernel
driver 9.0.29 and drbd-utils 9.17.
--sync-from-peer-node-id
This option allows the caller to select the node to
resync from. if it is not gives, DRBD selects a suitable source node
itself.
drbdsetup invalidate-remote resource
peer_node_id volume
Replace a peer device's data of a resource with the local
data. The peer device's data will be marked out-of-sync, and a resync from the
local node to the specified peer will be initiated.
Available options:
--reset-bitmap=no
Usually an invalidate remote operation sets all bits in
the bitmap to out-of-sync before beginning the resync to the peer. By giving
--reset-bitmap=no DRBD will use the bitmap as it is. Usually this is
used after an online verify operation found differences in the backing
devices.
The --reset-bitmap option is available since DRBD kernel
driver 9.0.29 and drbd-utils 9.17.
drbdsetup new-current-uuid minor
Generate a new current UUID and rotates all other UUID
values. This has at least two use cases, namely to skip the initial sync, and
to reduce network bandwidth when starting in a single node configuration and
then later (re-)integrating a remote site.
Available option:
--clear-bitmap
Clears the sync bitmap in addition to generating a new
current UUID.
This can be used to skip the initial sync, if you want to start
from scratch. This use-case does only work on "Just Created" meta
data. Necessary steps:
1.On
both nodes, initialize meta data and
configure the device.
drbdadm create-md --force
res/volume-number
2.They need to do the initial handshake, so they know
their sizes.
drbdadm up res
3.They are now Connected Secondary/Secondary
Inconsistent/Inconsistent. Generate a new current-uuid and clear the dirty
bitmap.
drbdadm --clear-bitmap new-current-uuid
res
4.They are now Connected Secondary/Secondary
UpToDate/UpToDate. Make one side primary and create a file system.
drbdadm primary res
mkfs -t fs-type $(drbdadm sh-dev
res)
One obvious side-effect is that the replica is full of old garbage
(unless you made them identical using other means), so any online-verify is
expected to find any number of out-of-sync blocks.
You must not use this on pre-existing data! Even though it
may appear to work at first glance, once you switch to the other node, your
data is toast, as it never got replicated. So do not leave out the
mkfs (or equivalent).
This can also be used to shorten the initial resync of a cluster
where the second node is added after the first node is gone into production,
by means of disk shipping. This use-case works on disconnected devices only,
the device may be in primary or secondary role.
The necessary steps on the current active server are:
1.drbdsetup new-current-uuid --clear-bitmap
minor
2.Take the copy of the current active server. E.g. by
pulling a disk out of the RAID1 controller, or by copying with dd. You need to
copy the actual data, and the meta data.
3.drbdsetup new-current-uuid
minor
Now add the disk to the new secondary node, and join it to the
cluster. You will get a resync of that parts that were changed since the
first call to drbdsetup in step 1.
drbdsetup new-minor resource minor
volume
Create a new replicated device within a resource. The
command creates a block device inode for the replicated device (by default,
/dev/drbdminor). The volume number identifies the device within
the resource.
drbdsetup new-resource resource node_id,
drbdsetup resource-options resource
The
new-resource command creates a new resource.
The
resource-options command changes the resource options of an
existing resource. Available options:
--auto-promote bool-value
A resource must be promoted to primary role before any of
its devices can be mounted or opened for writing.
Before DRBD 9, this could only be done explicitly ("drbdadm
primary"). Since DRBD 9, the auto-promote parameter allows to
automatically promote a resource to primary role when one of its devices is
mounted or opened for writing. As soon as all devices are unmounted or
closed with no more remaining users, the role of the resource changes back
to secondary.
Automatic promotion only succeeds if the cluster state allows it
(that is, if an explicit drbdadm primary command would succeed).
Otherwise, mounting or opening the device fails as it already did before
DRBD 9: the mount(2) system call fails with errno set to EROFS
(Read-only file system); the open(2) system call fails with errno set
to EMEDIUMTYPE (wrong medium type).
Irrespective of the auto-promote parameter, if a device is
promoted explicitly (drbdadm primary), it also needs to be demoted
explicitly (drbdadm secondary).
The auto-promote parameter is available since DRBD 9.0.0,
and defaults to yes.
--cpu-mask cpu-mask
Set the cpu affinity mask for DRBD kernel threads. The
cpu mask is specified as a hexadecimal number. The default value is 0, which
lets the scheduler decide which kernel threads run on which CPUs. CPU numbers
in cpu-mask which do not exist in the system are ignored.
--on-no-data-accessible policy
Determine how to deal with I/O requests when the
requested data is not available locally or remotely (for example, when all
disks have failed). When quorum is enabled,
on-no-data-accessible
should be set to the same value as
on-no-quorum. The defined policies
are:
io-error
System calls fail with errno set to EIO.
suspend-io
The resource suspends I/O. I/O can be resumed by
(re)attaching the lower-level device, by connecting to a peer which has access
to the data, or by forcing DRBD to resume I/O with drbdadm resume-io
res. When no data is available, forcing I/O to resume will
result in the same behavior as the io-error policy.
This setting is available since DRBD 8.3.9; the default policy is
io-error.
--peer-ack-window value
On each node and for each device, DRBD maintains a bitmap
of the differences between the local and remote data for each peer device. For
example, in a three-node setup (nodes A, B, C) each with a single device,
every node maintains one bitmap for each of its peers.
When nodes receive write requests, they know how to update the
bitmaps for the writing node, but not how to update the bitmaps between
themselves. In this example, when a write request propagates from node A to
B and C, nodes B and C know that they have the same data as node A, but not
whether or not they both have the same data.
As a remedy, the writing node occasionally sends peer-ack packets
to its peers which tell them which state they are in relative to each
other.
The peer-ack-window parameter specifies how much data a
primary node may send before sending a peer-ack packet. A low value causes
increased network traffic; a high value causes less network traffic but
higher memory consumption on secondary nodes and higher resync times between
the secondary nodes after primary node failures. (Note: peer-ack packets may
be sent due to other reasons as well, e.g. membership changes or expiry of
the peer-ack-delay timer.)
The default value for peer-ack-window is 2 MiB, the default
unit is sectors. This option is available since 9.0.0.
--peer-ack-delay expiry-time
If after the last finished write request no new write
request gets issued for
expiry-time, then a peer-ack packet is sent. If
a new write request is issued before the timer expires, the timer gets reset
to
expiry-time. (Note: peer-ack packets may be sent due to other
reasons as well, e.g. membership changes or the
peer-ack-window
option.)
This parameter may influence resync behavior on remote nodes. Peer
nodes need to wait until they receive an peer-ack for releasing a lock on an
AL-extent. Resync operations between peers may need to wait for for these
locks.
The default value for peer-ack-delay is 100 milliseconds,
the default unit is milliseconds. This option is available since 9.0.0.
--quorum value
When activated, a cluster partition requires quorum in
order to modify the replicated data set. That means a node in the cluster
partition can only be promoted to primary if the cluster partition has quorum.
Every node with a disk directly connected to the node that should be promoted
counts. If a primary node should execute a write request, but the cluster
partition has lost quorum, it will freeze IO or reject the write request with
an error (depending on the
on-no-quorum setting). Upon loosing quorum a
primary always invokes the
quorum-lost handler. The handler is intended
for notification purposes, its return code is ignored.
The option's value might be set to off, majority,
all or a numeric value. If you set it to a numeric value, make sure
that the value is greater than half of your number of nodes. Quorum is a
mechanism to avoid data divergence, it might be used instead of fencing when
there are more than two repicas. It defaults to off
If all missing nodes are marked as outdated, a partition always
has quorum, no matter how small it is. I.e. If you disconnect all secondary
nodes gracefully a single primary continues to operate. In the moment a
single secondary is lost, it has to be assumed that it forms a partition
with all the missing outdated nodes. In case my partition might be smaller
than the other, quorum is lost in this moment.
In case you want to allow permanently diskless nodes to gain
quorum it is recommendet to not use majority or all. It is
recommended to specify an absolute number, since DBRD's heuristic to
determine the complete number of diskfull nodes in the cluster is
unreliable.
The quorum implementation is available starting with the DRBD
kernel driver version 9.0.7.
--quorum-minimum-redundancy value
This option sets the minimal required number of nodes
with an UpToDate disk to allow the partition to gain quorum. This is a
different requirement than the plain
quorum option expresses.
The option's value might be set to off, majority,
all or a numeric value. If you set it to a numeric value, make sure
that the value is greater than half of your number of nodes.
In case you want to allow permanently diskless nodes to gain
quorum it is recommendet to not use majority or all. It is
recommended to specify an absolute number, since DBRD's heuristic to
determine the complete number of diskfull nodes in the cluster is
unreliable.
This option is available starting with the DRBD kernel driver
version 9.0.10.
--on-no-quorum {io-error | suspend-io}
By default DRBD freezes IO on a device, that lost quorum.
By setting the
on-no-quorum to
io-error it completes all IO
operations with an error if quorum ist lost.
Usually, the on-no-data-accessible should be set to the
same value as on-no-quorum, as it has precedence.
The on-no-quorum options is available starting with the
DRBD kernel driver version 9.0.8.
drbdsetup outdate minor
Mark the data on a lower-level device as outdated. This
is used for fencing, and prevents the resource the device is part of from
becoming primary in the future. See the --fencing disk option.
drbdsetup pause-sync resource peer_node_id
volume
Stop resynchronizing between a local and a peer device by
setting the local pause flag. The resync can only resume if the pause flags on
both sides of a connection are cleared.
drbdsetup primary resource
Change the role of a node in a resource to primary. This
allows the replicated devices in this resource to be mounted or opened for
writing. Available options:
--overwrite-data-of-peer
This option is an alias for the --force
option.
--force
Force the resource to become primary even if some devices
are not guaranteed to have up-to-date data. This option is used to turn one of
the nodes in a newly created cluster into the primary node, or when manually
recovering from a disaster.
Note that this can lead to split-brain scenarios. Also, when
forcefully turning an inconsistent device into an up-to-date device, it is
highly recommended to use any integrity checks available (such as a
filesystem check) to make sure that the device can at least be used without
crashing the system.
Note that DRBD usually only allows one node in a cluster to be in
primary role at any time; this allows DRBD to coordinate access to the
devices in a resource across nodes. The --allow-two-primaries network
option changes this; in that case, a mechanism outside of DRBD needs to
coordinate device access.
drbdsetup resize minor
Reexamine the size of the lower-level devices of a
replicated device on all nodes. This command is called after the lower-level
devices on all nodes have been grown to adjust the size of the replicated
device. Available options:
--assume-peer-has-space
Resize the device even if some of the peer devices are
not connected at the moment. DRBD will try to resize the peer devices when
they next connect. It will refuse to connect to a peer device which is too
small.
--assume-clean
Do not resynchronize the added disk space; instead,
assume that it is identical on all nodes. This option can be used when the
disk space is uninitialized and differences do not matter, or when it is known
to be identical on all nodes. See the drbdsetup verify command.
--size val
This option can be used to online shrink the usable size
of a drbd device. It's the users responsibility to make sure that a file
system on the device is not truncated by that operation.
--al-stripes val --al-stripes
val
These options may be used to change the layout of the
activity log online. In case of internal meta data this may invovle shrinking
the user visible size at the same time (unsing the --size) or
increasing the avalable space on the backing devices.
drbdsetup resume-io minor
Resume I/O on a replicated device. See the
--fencing net option.
drbdsetup resume-sync resource peer_node_id
volume
Allow resynchronization to resume by clearing the local
sync pause flag.
drbdsetup role resource
Show the current role of a resource.
drbdsetup secondary resource
Change the role of a node in a resource to secondary.
This command fails if the replicated device is in use.
drbdsetup show {resource | all}
Show the current configuration of a resource, or of all
resources. Available options:
--show-defaults
Show all configuration parameters, even the ones with
default values. Normally, parameters with default values are not shown.
drbdsetup show-gi resource peer_node_id
volume
Show the data generation identifiers for a device on a
particular connection. In addition, explain the output. The output otherwise
is the same as in the drbdsetup get-gi command.
drbdsetup state
This is an alias for drbdsetup role.
Deprecated.
drbdsetup status {resource | all}
Show the status of a resource, or of all resources. The
output consists of one paragraph for each configured resource. Each paragraph
contains one line for each resource, followed by one line for each device, and
one line for each connection. The device and connection lines are indented.
The connection lines are followed by one line for each peer device; these
lines are indented against the connection line.
Long lines are wrapped around at terminal width, and indented to
indicate how the lines belongs together. Available options:
--verbose
Include more information in the output even when it is
likely redundant or irrelevant.
--statistics
Include data transfer statistics in the output.
--color={always | auto | never}
Colorize the output. With
--color=auto,
drbdsetup emits color codes only when standard output is connected to a
terminal.
For example, the non-verbose output for a resource with only one
connection and only one volume could look like this:
drbd0 role:Primary
disk:UpToDate
host2.example.com role:Secondary
disk:UpToDate
With the --verbose option, the same resource could be
reported as:
drbd0 node-id:1 role:Primary suspended:no
volume:0 minor:1 disk:UpToDate blocked:no
host2.example.com local:ipv4:192.168.123.4:7788
peer:ipv4:192.168.123.2:7788 node-id:0 connection:WFReportParams
role:Secondary congested:no
volume:0 replication:Connected disk:UpToDate resync-suspended:no
drbdsetup suspend-io minor
Suspend I/O on a replicated device. It is not usually
necessary to use this command.
drbdsetup verify resource peer_node_id
volume
Start online verification, change which part of the
device will be verified, or stop online verification. The command requires the
specified peer to be connected.
Online verification compares each disk block on the local and peer
node. Blocks which differ between the nodes are marked as out-of-sync, but
they are not automatically brought back into sync. To bring them into
sync, the drbdsetup invalidate or drbdsetup invalidate-remote
with the --reset-bitmap=no option can be used. Progress can be
monitored in the output of drbdsetup status --statistics. Available
options:
--start position
Define where online verification should start. This
parameter is ignored if online verification is already in progress. If the
start parameter is not specified, online verification will continue where it
was interrupted (if the connection to the peer was lost while verifying),
after the previous stop sector (if the previous online verification has
finished), or at the beginning of the device (if the end of the device was
reached, or online verify has not run before).
The position on disk is specified in disk sectors (512 bytes) by
default.
--stop position
Define where online verification should stop. If online
verification is already in progress, the stop position of the active online
verification process is changed. Use this to stop online verification.
The position on disk is specified in disk sectors (512 bytes) by
default.
Also see the notes on data integrity in the drbd.conf(5)
manual page.
drbdsetup wait-connect-volume resource
peer_node_id volume,
drbdsetup wait-connect-connection resource peer_node_id,
drbdsetup wait-connect-resource resource,
drbdsetup wait-sync-volume resource peer_node_id
volume,
drbdsetup wait-sync-connection resource peer_node_id,
drbdsetup wait-sync-resource resource
The
wait-connect-* commands waits until a device
on a peer is visible. The
wait-sync-* commands waits until a device on
a peer is up to date. Available options for both commands:
--degr-wfc-timeout timeout
Define how long to wait until all peers are connected in
case the cluster consisted of a single node only when the system went down.
This parameter is usually set to a value smaller than
wfc-timeout. The
assumption here is that peers which were unreachable before a reboot are less
likely to be reachable after the reboot, so waiting is less likely to help.
The timeout is specified in seconds. The default value is 0, which
stands for an infinite timeout. Also see the wfc-timeout
parameter.
--outdated-wfc-timeout timeout
Define how long to wait until all peers are connected if
all peers were outdated when the system went down. This parameter is usually
set to a value smaller than
wfc-timeout. The assumption here is that an
outdated peer cannot have become primary in the meantime, so we don't need to
wait for it as long as for a node which was alive before.
The timeout is specified in seconds. The default value is 0, which
stands for an infinite timeout. Also see the wfc-timeout
parameter.
--wait-after-sb
This parameter causes DRBD to continue waiting in the
init script even when a split-brain situation has been detected, and the nodes
therefore refuse to connect to each other.
--wfc-timeout timeout
Define how long the init script waits until all peers are
connected. This can be useful in combination with a cluster manager which
cannot manage DRBD resources: when the cluster manager starts, the DRBD
resources will already be up and running. With a more capable cluster manager
such as Pacemaker, it makes more sense to let the cluster manager control DRBD
resources. The timeout is specified in seconds. The default value is 0, which
stands for an infinite timeout. Also see the degr-wfc-timeout
parameter.
drbdsetup forget-peer resource
peer_node_id
The
forget-peer command removes all traces of a
peer node from the meta-data. It frees a bitmap slot in the meta-data and make
it avalable for futher bitmap slot allocation in case a so-far never seen node
connects.
The connection must be taken down before this command may be used.
In case the peer re-connects at a later point a bit-map based resync will be
turned into a full-sync.
drbdsetup rename-resource resource
new_name
Change the name of
resource to
new_name on
the local node. Note that, since there is no concept of resource names in
DRBD's network protocol, it is technically possible to have different names
for a resource on different nodes. However, it is strongly recommended to
issue the same
rename-resource command on all nodes to have consistent
naming across the cluster.
A rename event will be issued on the events2 stream
to notify users of the new name.