From: "Fabio M. Di Nitto" <fdinitto@xxxxxxxxxx> fix a few typos on the way and separate config / library bits Signed-off-by: Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> --- corosync.spec.in | 1 + man/Makefile.am | 1 + man/corosync.conf.5 | 1 + man/index.html | 4 + man/votequorum.5 | 348 +++++++++++++++++++++++++++++++++++++++++++++ man/votequorum_overview.8 | 26 +--- 6 files changed, 358 insertions(+), 23 deletions(-) create mode 100644 man/votequorum.5 diff --git a/corosync.spec.in b/corosync.spec.in index 49fee03..2b78f2f 100644 --- a/corosync.spec.in +++ b/corosync.spec.in @@ -187,6 +187,7 @@ fi %{_mandir}/man8/corosync-notifyd.8* %{_mandir}/man8/corosync-quorumtool.8* %{_mandir}/man5/corosync.conf.5* +%{_mandir}/man5/votequorum.5* # optional testagent rpm diff --git a/man/Makefile.am b/man/Makefile.am index 767b5e6..865a9a6 100644 --- a/man/Makefile.am +++ b/man/Makefile.am @@ -38,6 +38,7 @@ EXTRA_DIST = index.html dist_man_MANS = \ corosync.conf.5 \ + votequorum.5 \ corosync.8 \ corosync-cmapctl.8 \ corosync-blackbox.8 \ diff --git a/man/corosync.conf.5 b/man/corosync.conf.5 index a8fbe98..d85099f 100644 --- a/man/corosync.conf.5 +++ b/man/corosync.conf.5 @@ -652,5 +652,6 @@ The corosync executive configuration file. .SH "SEE ALSO" .BR corosync_overview (8), +.BR votequorum (5), .BR logrotate (8) .PP diff --git a/man/index.html b/man/index.html index 0b5b1ab..1133f33 100644 --- a/man/index.html +++ b/man/index.html @@ -31,6 +31,10 @@ Description of configuration options for corosync xml config format. <br> + <a href="votequorum.5.html">votequorum(5)</a>: + Description of configuration options for votequorum module in corosync.conf + <br> + <a href="corosync.8.html">corosync(8)</a>: Description of corosync daemon. <br> diff --git a/man/votequorum.5 b/man/votequorum.5 new file mode 100644 index 0000000..132334e --- /dev/null +++ b/man/votequorum.5 @@ -0,0 +1,348 @@ +.\"/* +.\" * Copyright (c) 2012 Red Hat, Inc. +.\" * +.\" * All rights reserved. +.\" * +.\" * Authors: Christine Caulfield <ccaulfie@xxxxxxxxxx> +.\" * Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> +.\" * +.\" * This software licensed under BSD license, the text of which follows: +.\" * +.\" * Redistribution and use in source and binary forms, with or without +.\" * modification, are permitted provided that the following conditions are met: +.\" * +.\" * - Redistributions of source code must retain the above copyright notice, +.\" * this list of conditions and the following disclaimer. +.\" * - Redistributions in binary form must reproduce the above copyright notice, +.\" * this list of conditions and the following disclaimer in the documentation +.\" * and/or other materials provided with the distribution. +.\" * - Neither the name of the MontaVista Software, Inc. nor the names of its +.\" * contributors may be used to endorse or promote products derived from this +.\" * software without specific prior written permission. +.\" * +.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE +.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF +.\" * THE POSSIBILITY OF SUCH DAMAGE. +.\" */ +.TH VOTEQUORUM 5 2012-01-24 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual" +.SH NAME +votequorum \- Votequorum Configuration Overview +.SH OVERVIEW +The votequorum service is part of the corosync project. This service can be optionally loaded +into the nodes of a corosync cluster to avoid split-brain situations. +It does this by having a number of votes assigned to each system in the cluster and ensuring +that only when a majority of the votes are present, cluster operations are allowed to proceed. +The service must be loaded into all nodes or none. If it is loaded into a subset of cluster nodes +the results will be unpredictable. +.PP +The following corosync.conf extract will enable votequorum service within corosync: +.PP +.nf +quorum { + provider: corosync_votequorum +} +.fi +.PP +votequorum reads its configuration from corosync.conf. Some values can be changed at runtime, others +are only read at corosync startup. It is very important that those values are consistent +across all the nodes participating in the cluster or votequorum behavior will be unpredictable. +.PP +votequorum requires an expected_votes value to function, this can be provided in two ways. +The number of expected votes will be automatically calculated when the nodelist { } section is +present in corosync.conf or expected_votes can be specified in the quorum { } section. Lack of +both will disable votequorum. If both are present at the same time, +the quorum.expected_votes value will override the one calculated from the nodelist. +.PP +Example (no nodelist) of an 8 node cluster (each node has 1 vote): +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 +} +.fi +.PP +Example (with nodelist) of a 3 node cluster (each node has 1 vote): +.nf + +quorum { + provider: corosync_votequorum +} + +nodelist { + node { + ring0_addr: 192.168.1.1 + } + node { + ring0_addr: 192.168.1.2 + } + node { + ring0_addr: 192.168.1.3 + } +} +.fi +.SH SPECIAL FEATURES +.PP +.B two_node: 1 +.PP +Enables two node cluster operations (default: 0). +.PP +The "two node cluster" is a use case that requires special consideration. +With a standard two node cluster, each node with a single vote, there +are 2 votes in the cluster. Using the simple majority calculation +(50% of the votes + 1) to calculate quorum, the quorum would be 2. +This means that the both nodes would always have +to be alive for the cluster to be quorate and operate. +.PP +Enabling two_node: 1, quorum is set artificially to 1. +.PP +Example configuration 1: + +.nf +quorum { + provider: corosync_votequorum + expected_votes: 2 + two_node: 1 +} +.fi + +.PP +Example configuration 2: + +.nf +quorum { + provider: corosync_votequorum + two_node: 1 +} + +nodelist { + node { + ring0_addr: 192.168.1.1 + } + node { + ring0_addr: 192.168.1.2 + } +} +.fi +.PP +NOTES: enabling two_node: 1 automatically enables wait_for_all. It is +still possible to override wait_for_all by explicitly setting it to 0. +If more than 2 nodes join the cluster, the two_node option is +automatically disabled. +.PP +.B wait_for_all: 1 +.PP +Enables Wait For All (WFA) feature (default: 0). +.PP +The general behaviour of votequorum is to switch a cluster from inquorate to quorate +as soon as possible. For example, in an 8 node cluster, where every node has 1 vote, +expected_votes is set to 8 and quorum is (50% + 1) 5. As soon as 5 (or more) nodes +are visible to each other, the partition of 5 (or more) becomes quorate and can +start operating. +.PP +When WFA is enabled, the cluster will be quorate for the first time +only after all nodes have been visible at least once at the same time. +.PP +This feature has the advantage of avoiding some startup race conditions, with the cost +that all nodes need to be up at the same time at least once before the cluster +can operate. +.PP +A common startup race condition based on the above example is that as soon as 5 +nodes become quorate, with the other 3 still offline, the remaining 3 nodes will +be fenced. +.PP +It is very useful when combined with last_man_standing (see below). +.PP +Example configuration: +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 + wait_for_all: 1 +} +.fi +.PP +.B last_man_standing: 1 +/ +.B last_man_standing_window: 10000 +.PP +Enables Last Man Standing (LMS) feature (default: 0). +Tunable last_man_standing_window (default: 10 seconds, expressed in ms). +.PP +The general behaviour of votequorum is to set expected_votes and quorum +at startup (unless modified by the user at runtime, see below) and use +those values during the whole lifetime of the cluster. +.PP +Using for example an 8 node cluster where each node has 1 vote, expected_votes +is set to 8 and quorum to 5. This condition allows a total failure of 3 +nodes. If a 4th node fails, the cluster becomes inquorate and it will +stop providing services. +.PP +Enabling LMS allows the cluster to dynamically recalculate expected_votes +and quorum under specific circumstances. It is essential to enable +WFA when using LMS in High Availability clusters. +.PP +Using the above 8 node cluster example, with LMS enabled the cluster can retain +quorum and continue operating by losing, in a cascade fashion, up to 6 nodes with +only 2 remaining active. +.PP +Example chain of events: +.nf +1) cluster is fully operational with 8 nodes. + (expected_votes: 8 quorum: 5) + +2) 3 nodes die, cluster is quorate with 5 nodes. + +3) after last_man_standing_window timer expires, + expected_votes and quorum are recalculated. + (expected_votes: 5 quorum: 3) + +4) at this point, 2 more nodes can die and + cluster will still be quorate with 3. + +5) once again, after last_man_standing_window + timer expires expected_votes and quorum are + recalculated. + (expected_votes: 3 quorum: 2) + +6) at this point, 1 more node can die and + cluster will still be quorate with 2. + +7) one more last_man_standing_window timer + (expected_votes: 2 quorum: 2) +.fi +.PP +NOTES: In order for the cluster to downgrade automatically from 2 nodes +to a 1 node cluster, the auto_tie_breaker feature must also be enabled (see below). +If auto_tie_breaker is not enabled, and one more failure occours, the +remaining node will not be quorate. LMS does not work with asymmetric voting +schemes, each node must vote 1. +.PP +Example configuration 1: +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 + last_man_standing: 1 +} +.fi +.PP +Example configuration 2 (increase timeout to 20 seconds): +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 + last_man_standing: 1 + last_man_standing_window: 20000 +} +.fi +.PP +.B auto_tie_breaker: 1 +.PP +Enables Auto Tie Breaker (ATB) feature (default: 0). +.PP +The general behaviour of votequorum allows a simultaneous node failure up +to 50% - 1 node, assuming each node has 1 vote. +.PP +When ATB is enabled, the cluster can suffer up to 50% of the nodes failing +at the same time, in a deterministic fashion. The cluster partition, or the +set of nodes that are still in contact with the node that has the lowest +nodeid will remain quorate. The other nodes will be inquorate. +.PP +NOTES: For ATB to work, the lowest nodeid in the cluster needs to be known. +corosync can automatically generate a nodeid or it can be overridden manually. +If nodeids are not known at startup, ATB will automatically enable WFA. +WFA will guarantee that all nodeids in the cluster are known before ATB can +operate correctly. +.PP +Example configuration 1: +.nf + +quorum { + provider: corosync_votequorum + expected_votes: 8 + auto_tie_breaker: 1 +} + +.fi +this will also automatically enable WFA. +.PP +Example configuration 2: +.nf + +quorum { + provider: corosync_votequorum + auto_tie_breaker: 1 +} + +nodelist { + node { + ring0_addr: 192.168.1.1 + nodeid: 1 + } + node { + ring0_addr: 192.168.1.2 + nodeid: 2 + } + node { + ring0_addr: 192.168.1.3 + nodeid: 3 + } + node { + ring0_addr: 192.168.1.4 + nodeid: 4 + } +} + +.fi +this will allow ATB to work without WFA as all nodeids are known at startup. +.SH VARIOUS NOTES +.PP +* WFA / LMS / ATB can be used combined together. +.PP +* In order to change the default votes for a node there are two options: +.nf + +1) nodelist: + +nodelist { + node { + ring0_addr: 192.168.1.1 + quorum_votes: 3 + } + .... +} + +2) quorum section (deprecated): + +quorum { + provider: corosync_votequorum + expected_votes: 2 + votes: 2 +} + +.fi +In the event that both nodelist and quorum { votes: } are defined, the value +from the nodelist will be used. +.PP +* Only votes, quorum_votes, expected_votes and two_node can be changed at runtime. Everything else +requires a cluster restart. +.SH BUGS +No known bugs at the time of writing. The authors are from outerspace. Deal with it. +.SH "SEE ALSO" +.BR corosync (8), +.BR corosync.conf (5), +.BR corosync-quorumtool (8), +.BR votequorum_overview (8) +.PP diff --git a/man/votequorum_overview.8 b/man/votequorum_overview.8 index a43a1e3..ce9553a 100644 --- a/man/votequorum_overview.8 +++ b/man/votequorum_overview.8 @@ -36,8 +36,8 @@ .SH NAME votequorum_overview \- Votequorum Library Overview .SH OVERVIEW -The votequuorum library is delivered with the corosync project. It is the external interface to -the vote-based quorum service. This service is optionally loaded into all ndes in a corosync cluster +The votequorum library is delivered with the corosync project. It is the external interface to +the vote-based quorum service. This service is optionally loaded into all nodes in a corosync cluster to avoid split-brain situations. It does this by having a number of votes assigned to each system in the cluster and ensuring that only when a majority of the votes are present, cluster operations are allowed to proceed. @@ -56,31 +56,11 @@ The library provides a mechanism to: .PP * Connect an additional quorum device to allow small clusters to remain quorate during node outages. .PP -.B votequorum -reads its configuration from internal cmap database. The following keys are read when it starts up: -.PP -* quorum.expected_votes -.br -* quorum.votes -.br -* quorum.quorumdev_poll -.br -* quorum.two_node -.br -* quorum.wait_for_all -.br -* quorum.last_man_standing -.br -* quorum.last_man_standing_window -.br -* quorum.auto_tie_breaker -.PP -Values that can be changed at runtime are expected_votes, votes, quorumdev_poll and two_nodes -.PP .SH BUGS No known bugs at the time of writing. The authors are from outerspace. Deal with it. .SH "SEE ALSO" .BR corosync-quorumtool (8), +.BR votequorum (5), .BR votequorum_context_get (3), .BR votequorum_context_set (3), .BR votequorum_dispatch (3), -- 1.7.7.6 _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss