corosync-qdevice - QDevice daemon
corosync-qdevice [-dfh] [-S option=value[,option2=value2,...]]
corosync-qdevice is a daemon running on each node of a cluster. It
provides a configured number of votes to the quorum subsystem based on a
third-party arbitrator's decision. Its primary use is to allow a cluster to
sustain more node failures than standard quorum rules allow. It is recommended
for clusters with an even number of nodes and highly recommended for 2 node
clusters.
- -d
- Forcefully turn on debug information without the need to
change corosync.conf. For bumping syslog messages priority to info, use
this parameter twice.
- -f
- Do not daemonize, run in the foreground.
- -h
- Show short help text
- -S
- Set advanced settings described in its own section below.
This option shouldn't be generally used because most of the options are
not safe to change.
corosync-qdevice reads its configuration from corosync.conf file.
The main configuration is within
quorum.device sub-key. Each model also
has its own configuration within a similarly named sub-key.
- model
- Specifies the model to be used. This parameter is required.
corosync-qdevice is modular and is able to support multiple
different models. The model basically defines what type of arbitrator is
used. Currently only net is supported.
- timeout
- Specifies how often corosync-qdevice should call the
votequorum_qdevice_poll function. It is also used by the net model
to adjust its hearbeat timeout. It is recommended that you don't change
this value. Default is 10000.
- sync_timeout
- Specifies how often corosync-qdevice should call the
votequorum_qdevice_poll function during a sync phase. It is recommended
that you don't change this value. Default is 30000.
- votes
- The number of votes provided to the cluster by qdevice.
Default is (number_of_nodes - 1) or generally sum(votes_per_node) - 1.
quorum.device.heuristics subkey holds the configuration of the
heuristics. Heuristics are set of commands executed locally on startup,
cluster membership change, successful connect to
corosync-qnetd and
optionally also at regular times. Commands are executed in parallel. When all
commands finish successfully (their return error code is zero) on time,
heuristics have passed, otherwise they have failed. The heuristics result is
sent to
corosync-qnetd and there it's used in calculations to determine
which partition should be quorate.
- timeout
- Specifies maximum time in milliseconds how long
corosync-qdevice waits till the heuristics commands finish. If some
command doesn't finish before the timeout, it's killed and heuristics
fail. This timeout is used for heuristics executed at regular times.
Default value is half of the quorum.device.timeout, so
5000.
- sync_timeout
- Similar to quorum.device.heuristics.timeout but used during
membership changes. Default value is half of the
quorum.device.sync_timeout, so 15000.
- interval
- Specifies interval between two regular heuristics
execution. Default value is 3 * quorum.device.timeout, so
30000.
- mode
- Can be one of on, sync or off and
specifies mode of operation of heuristics. Default is off, which
means heuristics are disabled. When sync is set, heuristics are
executed only during startup, membership change and when connection to
corosync-qnetd is established. When heuristics should be running
also on regular basis, this option should be set to on value.
- exec_NAME
- defines executables. NAME can be arbitrary valid
cmap key name string and it has no special meaning. The value of this
variable must contain a command to execute. The value is parsed (split)
into arguments similarly as Bourne shell would do. Quoting is possible by
using backslash and double quotes.
quorum.device.net subkey holds the configuration for
model
net.
- tls
- Can be one of on, off or required and
specifies if tls should be used. on means a connection with TLS is
attempted first, but if the server doesn't advertise TLS support then
non-TLS will be used. off is used then TLS is not required and it's
then not even tried. This mode is the only one which doesn't need a
properly initialized NSS database. required means TLS is required
and if the server doesn't support TLS, qdevice will exit with error
message. Default is on.
- host
- Specifies the IP address or host name of the qnetd server
to be used. This parameter is required.
- port
- Specifies TCP port of qnetd server. Default is
5403.
- algorithm
- Decision algorithm. Can be one of the ffsplit or
lms. (actually there are also test and 2nodelms, both
of which are mainly for developers and shouldn't be used for production
clusters). For a description of what each algorithm means and how the
algorithms differ see their individual sections. Default value is
ffsplit.
- tie_breaker
- can be one of lowest, highest or
valid_node_id (number) values. It's used as a fallback if qdevice has to
decide between two or more equal partitions. lowest means the
partition with the lowest node id is chosen. highest means the
partition with highest node id is chosen. And valid_node_id means that the
partition containing the node with the given node id is chosen. Default is
lowest.
- connect_timeout
- Timeout when corosync-qdevice is trying to connect
to corosync-qnetd host. Default is 0.8 *
quorum.device.timeout.
- force_ip_version
- can be one of 0|4|6 and forces the software to use
the given IP version. 0 (default value) means IPv6 is preferred and
IPv4 should be used as a fallback.
- keep_active_partition_tie_breaker
- Can be one of on or off and specifies if keep
active partition tie breaker should be used. When this option is enabled
and tie happens QNetd will prefer partition with members of previously
active (quorate) partition. This is hard-coded behavior of LMS algorithm
so this setting affects only FFSplit algorithm. Default is on.
Logging configuration is within the
logging directive.
corosync-qdevice parses and supports only
debug option. The
logger_subsys sub-directive can be also used if
subsys is set to
QDEVICE.
For
corosync-qdevice to work correctly, the
nodelist directive has
to be used and properly configured. Also the
net model requires that
totem.cluster_name option is set.
For
model net to work using TLS, it's necessary to create the NSS
database, import Qnetd CA certificate, and get/distribute a valid client
certificate.
If pcs is used (recommended) the following steps are not needed because pcs does
them automatically.
corosync-qdevice-net-certutil is the tool to perform required actions
semi-automatically. Please consult the help output of it and its man page. For
a first time configuration it may make sense to start with the
-Q
option.
If TLS is not required just edit corosync.conf file and set
quorum.device.net.tls to
off.
Depending on configuration of NSS (stored in nss.config file usually in
/etc/crypto-policies/back-ends/ directory) disabled ciphers or too short keys
may be rejected. Proper solution is to regenerate NSS databases for both
corosync-qnetd and
corosync-qdevice daemons. As a quick
workaround it's also possible to set environment variable
NSS_IGNORE_SYSTEM_POLICY=1 before running
corosync-qdevice
daemon.
When NSS is updated it may also be needed to upgrade database into new format.
There is no consensus on recommended way, but following command seems to work
just fine (if qdevice sysconfdir is set to /etc)
# certutil -N -d /etc/corosync/qdevice/net/nssdb -f /etc/corosync/qdevice/net/nssdb/pwdfile.txt
Algorithms are used to change behavior of how
corosync-qnetd provides
votes to a given node/partition. Currently there are two algorithms supported.
- ffsplit
- This one makes sense only for clusters with an even number
of nodes. It provides exactly one vote to the partition with the highest
number of active nodes. If there are two exactly similar partitions, it
provides its vote to the partition with higher score. The score is
computed as (number_of_connected_nodes +
number_of_connected_nodes_with_passed_heuristics -
number_of_connected_nodes_with_failed_heuristics) If the scores are equal,
the vote is provided to partition with the most clients connected to the
qnetd server. If this number is also equal, then the tie_breaker is used.
It is able to transition its vote if the currently active partition
becomes partitioned and a non-active partition still has at least 50% of
the active nodes. Because of this, a vote is not provided if the qnetd
connection is not active.
To use this algorithm it's required to set the number of votes per node to 1
(default) and the qdevice number of votes has to be also 1. This is
achieved by setting quorum.device.votes key in corosync.conf file
to 1.
- lms
- Last-man-standing. If the node is the only one left in the
cluster that can see the qnetd server then we return a vote.
If more than one node can see the qnetd server but some nodes can't see each
other then the cluster is divided up into 'partitions' based on their
ring_id and this algorithm returns a vote to the partition with highest
heuristics score (computed the same way as for the ffsplit
algorithm), or if there is more than 1 partition with equal scores, the
largest active partition or, if there is more than 1 equal partition, the
partition that contains the tie_breaker node (lowest, highest, etc). For
LMS to work, the number of qdevice votes has to be set to default (so just
delete quorum.device.votes key from corosync.conf).
Set by using
-S option. The default value is shown in parentheses)
Options beginning with
net_ prefix are specific to
model
net.
- lock_file
- Lock file location.
(/var/run/corosync-qdevice/corosync-qdevice.pid)
- local_socket_file
- Internal IPC socket file location.
(/var/run/corosync-qdevice/corosync-qdevice.sock)
- local_socket_backlog
- Parameter passed to listen syscall. (10)
- max_cs_try_again
- How many times to retry the call to a corosync function
which has returned CS_ERR_TRY_AGAIN. (10)
- votequorum_device_name
- Name used for qdevice registration. (Qdevice)
- ipc_max_clients
- Maximum allowed simultaneous IPC clients. (10)
- ipc_max_receive_size
- Maximum size of a message received by IPC client.
(4096)
- ipc_max_send_size
- Maximum size of a message allowed to be sent to an IPC
client. (65536)
- master_wins
- Force enable/disable master wins. (default is model)
- heuristics_ipc_max_send_buffers
- Maximum number of heuristics worker send buffers.
(128)
- heuristics_ipc_max_send_receive_size
- Maximum size of a message allowed to be send to, or
received from heuristics worker. (4096)
- heuristics_min_timeout
- Minimum heuristics timeout accepted by client in ms.
(1000)
- heuristics_max_timeout
- Maximum heuristics timeout accepted by client in ms.
(120000)
- heuristics_min_interval
- Minimum heuristics interval accepted by client in ms.
(1000)
- heuristics_max_interval
- Maximum heuristics interval accepted by client in ms.
(3600000)
- heuristics_max_execs
- Maximum number of exec_ commands. (32)
- heuristics_use_execvp
- Use execvp instead of execv for executing commands.
(off)
- heuristics_max_processes
- Maximum number of processes running at one time. (160)
- heuristics_kill_list_interval
- Interval between status is gathered and eventually signal
is sent to processes which didn't finished on time in ms. (5000)
- net_nss_db_dir
- NSS database directory.
(/etc/corosync/qdevice/net/nssdb)
- net_initial_msg_receive_size
- Initial (used during connection parameters negotiation)
maximum size of the receive buffer for message (maximum allowed message
size received from qnetd). (32768)
- net_initial_msg_send_size
- Initial (used during connection parameter negotiation)
maximum size of one send buffer (message) to be sent to server.
(32768)
- net_min_msg_send_size
- Minimum required size of one send buffer (message) to be
sent to server. (32768)
- net_max_msg_receive_size
- Maximum allowed size of receive buffer for a message sent
by server. (16777216)
- net_max_send_buffers
- Maximum number of send buffers. (10)
- net_nss_qnetd_cn
- Canonical name of qnetd server certificate. (Qnetd
Server)
- net_nss_client_cert_nickname
- NSS nickname of qdevice client certificate. (Cluster
Cert)
- net_heartbeat_interval_min
- Minimum heartbeat timeout accepted by client in ms.
(1000)
- net_heartbeat_interval_max
- Maximum heartbeat timeout accepted by client in ms.
(120000)
- net_min_connect_timeout
- Minimum connection timeout accepted by client in ms.
(1000)
- net_max_connect_timeout
- Maximum connection timeout accepted by client in ms.
(120000)
- net_test_algorithm_enabled
- Enable test algorithm. (if built with --enable-debug on,
otherwise off)
Define qdevice with
net model connecting to qnetd running on
qnetd.example.org host, using
ffsplit algorithm. Heuristics is set to
sync mode and executes two commands.
quorum {
provider: corosync_votequorum
device {
votes: 1
model: net
net {
tls: on
host: qnetd.example.org
algorithm: ffsplit
}
heuristics {
mode: sync
exec_ping: /bin/ping -q -c 1 "www.example.org"
exec_test_txt_exists: /usr/bin/test -f /tmp/test.txt
}
}
corosync-qdevice-tool(8) corosync-qdevice-net-certutil(8)
corosync-qnetd(8) corosync.conf(5)
votequorum_qdevice_poll(3)
Jan Friesse