[PATCH 3/3] votequorum: fix expected_votes propagation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: "Fabio M. Di Nitto" <fdinitto@xxxxxxxxxx>

it is not correct to randomly accept expected_votes from any node in
the cluster. We can only allow expected_votes from quorate nodes.

A quorate cluster is "always" right and have the correct expected_votes.

One of the different bug triggers:

quorum {
  expected_votes: 8
  auto_tie_breaker: 1
  last_man_standing: 1
}

start all 8 nodes.
clean shut down 2 nodes.
wait for lms to kick in.
kill 3 nodes with highest nodeid
(we want to retain a quorate partition of 3 nodes)
start one node again -> cluster will be unquorate

This happens because the node rebooting/rejoining with
non current cluster status will propagate an expected_votes of 8,
while in reality the cluster is down to expected_votes: 3.

4 nodes are still < 5 (quorum for 8 nodes/votes).

In order to avoid this condition, we need to exchange expected_votes
information among nodes but we cannot randomly trust everybody.

1) Allow expected_votes to be changed cluster-wide only if the
   information is coming from a quorate node.
2) Fix node->expected_votes based on quorate status
3) allow a joining node to decrease quorum and expected_votes
   if the node is not yet quorate, but it's joining a quorate
   cluster

Signed-off-by: Fabio M. Di Nitto <fdinitto@xxxxxxxxxx>
---
 exec/votequorum.c |   16 ++++++++++++++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/exec/votequorum.c b/exec/votequorum.c
index 47132d6..73dbb3a 100644
--- a/exec/votequorum.c
+++ b/exec/votequorum.c
@@ -1016,6 +1016,7 @@ static void message_handler_req_exec_votequorum_nodeinfo (
 	int old_expected;
 	nodestate_t old_state;
 	int new_node = 0;
+	int allow_downgrade = 0;
 
 	ENTER();
 
@@ -1038,9 +1039,20 @@ static void message_handler_req_exec_votequorum_nodeinfo (
 
 	/* Update node state */
 	node->votes = req_exec_quorum_nodeinfo->votes;
-	node->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
 	node->state = NODESTATE_MEMBER;
 
+	if ((!cluster_is_quorate) &&
+	    (req_exec_quorum_nodeinfo->quorate)) {
+		allow_downgrade = 1;
+		us->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
+	}
+
+	if (req_exec_quorum_nodeinfo->quorate) {
+		node->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
+	} else {
+		node->expected_votes = us->expected_votes;
+	}
+
 	log_printf(LOGSYS_LEVEL_DEBUG, "nodeinfo message: votes: %d, expected: %d wfa: %d quorate: %d",
 					req_exec_quorum_nodeinfo->votes,
 					req_exec_quorum_nodeinfo->expected_votes,
@@ -1064,7 +1076,7 @@ static void message_handler_req_exec_votequorum_nodeinfo (
 	    old_votes != node->votes ||
 	    old_expected != node->expected_votes ||
 	    old_state != node->state) {
-		recalculate_quorum(0, 0);
+		recalculate_quorum(allow_downgrade, allow_downgrade);
 	}
 
 	if (!nodeid) {
-- 
1.7.7.6

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux