From: "Fabio M. Di Nitto" <fdinitto@xxxxxxxxxx> it is not correct to randomly accept expected_votes from any node in the cluster. We can only allow expected_votes from quorate nodes. A quorate cluster is "always" right and have the correct expected_votes. One of the different bug triggers: quorum { expected_votes: 8 auto_tie_breaker: 1 last_man_standing: 1 } start all 8 nodes. clean shut down 2 nodes. wait for lms to kick in. kill 3 nodes with highest nodeid (we want to retain a quorate partition of 3 nodes) start one node again -> cluster will be unquorate This happens because the node rebooting/rejoining with non current cluster status will propagate an expected_votes of 8, while in reality the cluster is down to expected_votes: 3. 4 nodes are still < 5 (quorum for 8 nodes/votes). In order to avoid this condition, we need to exchange expected_votes information among nodes but we cannot randomly trust everybody. 1) Allow expected_votes to be changed cluster-wide only if the information is coming from a quorate node. 2) Fix node->expected_votes based on quorate status 3) allow a joining node to decrease quorum and expected_votes if the node is not yet quorate, but it's joining a quorate cluster Signed-off-by: Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> --- exec/votequorum.c | 16 ++++++++++++++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/exec/votequorum.c b/exec/votequorum.c index 47132d6..798746a 100644 --- a/exec/votequorum.c +++ b/exec/votequorum.c @@ -1016,6 +1016,7 @@ static void message_handler_req_exec_votequorum_nodeinfo ( int old_expected; nodestate_t old_state; int new_node = 0; + int allow_downgrade = 0; ENTER(); @@ -1038,9 +1039,20 @@ static void message_handler_req_exec_votequorum_nodeinfo ( /* Update node state */ node->votes = req_exec_quorum_nodeinfo->votes; - node->expected_votes = req_exec_quorum_nodeinfo->expected_votes; node->state = NODESTATE_MEMBER; + if ((!cluster_is_quorate) && + (req_exec_quorum_nodeinfo->quorate)) { + allow_downgrade = 1; + us->expected_votes = req_exec_quorum_nodeinfo->expected_votes; + } + + if (req_exec_quorum_nodeinfo->quorate) { + node->expected_votes = req_exec_quorum_nodeinfo->expected_votes; + } else { + node->expected_votes = us->expected_votes; + } + log_printf(LOGSYS_LEVEL_DEBUG, "nodeinfo message: votes: %d, expected: %d wfa: %d quorate: %d", req_exec_quorum_nodeinfo->votes, req_exec_quorum_nodeinfo->expected_votes, @@ -1064,7 +1076,7 @@ static void message_handler_req_exec_votequorum_nodeinfo ( old_votes != node->votes || old_expected != node->expected_votes || old_state != node->state) { - recalculate_quorum(0, 0); + recalculate_quorum(allow_downgrade, 0); } if (!nodeid) { -- 1.7.7.6 _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss