Re: [PATCH 3/4] votequorum: fix expected_votes propagation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Reviewed-by: Steven Dake <sdake@xxxxxxxxxx>

On 01/26/2012 06:27 AM, Fabio M. Di Nitto wrote:
> From: "Fabio M. Di Nitto" <fdinitto@xxxxxxxxxx>
> 
> it is not correct to randomly accept expected_votes from any node in
> the cluster. We can only allow expected_votes from quorate nodes.
> 
> A quorate cluster is "always" right and have the correct expected_votes.
> 
> One of the different bug triggers:
> 
> quorum {
>   expected_votes: 8
>   auto_tie_breaker: 1
>   last_man_standing: 1
> }
> 
> start all 8 nodes.
> clean shut down 2 nodes.
> wait for lms to kick in.
> kill 3 nodes with highest nodeid
> (we want to retain a quorate partition of 3 nodes)
> start one node again -> cluster will be unquorate
> 
> This happens because the node rebooting/rejoining with
> non current cluster status will propagate an expected_votes of 8,
> while in reality the cluster is down to expected_votes: 3.
> 
> 4 nodes are still < 5 (quorum for 8 nodes/votes).
> 
> In order to avoid this condition, we need to exchange expected_votes
> information among nodes but we cannot randomly trust everybody.
> 
> 1) Allow expected_votes to be changed cluster-wide only if the
>    information is coming from a quorate node.
> 2) Fix node->expected_votes based on quorate status
> 3) allow a joining node to decrease quorum and expected_votes
>    if the node is not yet quorate, but it's joining a quorate
>    cluster
> 
> Signed-off-by: Fabio M. Di Nitto <fdinitto@xxxxxxxxxx>
> ---
>  exec/votequorum.c |   16 ++++++++++++++--
>  1 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/exec/votequorum.c b/exec/votequorum.c
> index 47132d6..798746a 100644
> --- a/exec/votequorum.c
> +++ b/exec/votequorum.c
> @@ -1016,6 +1016,7 @@ static void message_handler_req_exec_votequorum_nodeinfo (
>  	int old_expected;
>  	nodestate_t old_state;
>  	int new_node = 0;
> +	int allow_downgrade = 0;
>  
>  	ENTER();
>  
> @@ -1038,9 +1039,20 @@ static void message_handler_req_exec_votequorum_nodeinfo (
>  
>  	/* Update node state */
>  	node->votes = req_exec_quorum_nodeinfo->votes;
> -	node->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
>  	node->state = NODESTATE_MEMBER;
>  
> +	if ((!cluster_is_quorate) &&
> +	    (req_exec_quorum_nodeinfo->quorate)) {
> +		allow_downgrade = 1;
> +		us->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
> +	}
> +
> +	if (req_exec_quorum_nodeinfo->quorate) {
> +		node->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
> +	} else {
> +		node->expected_votes = us->expected_votes;
> +	}
> +
>  	log_printf(LOGSYS_LEVEL_DEBUG, "nodeinfo message: votes: %d, expected: %d wfa: %d quorate: %d",
>  					req_exec_quorum_nodeinfo->votes,
>  					req_exec_quorum_nodeinfo->expected_votes,
> @@ -1064,7 +1076,7 @@ static void message_handler_req_exec_votequorum_nodeinfo (
>  	    old_votes != node->votes ||
>  	    old_expected != node->expected_votes ||
>  	    old_state != node->state) {
> -		recalculate_quorum(0, 0);
> +		recalculate_quorum(allow_downgrade, 0);
>  	}
>  
>  	if (!nodeid) {

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux