Re: [PATCH] Ignore the commit_token retransmitted by upstream node on the first rotation

Steven Dake <sdake@xxxxxxxxxx> · Sat, 10 Dec 2011 09:11:46 -0700

On 12/10/2011 02:27 AM, Yunkai Zhang wrote:
> After the representative of the new ring entered into RECOVERY state,
> it was waiting for the commit_token on the second rotation. But when
> it got a ncommit_token, it didn't check the status of this token
> which could be retransmitted by upstream node on the first rotation,
> and then the representative node would send the initial ORF token
> prematurely.
> 
> If other nodes which have not entered into RECOVERY state received this
> incorrect initial orf_token(which seq is 0), these nodes may crash
> caused by assert instraction in *orf_token_rtr* function:
>   ...

Yunkai,

First off thanks for digging into Totem.  I know the protocol is hard to
understand and its nice to have more people to bounce ideas off of.

Do you have a test case for this?

Patch is a clever solution but the root cause seems incorrect.  I'd like
to get to the RCA of your bug.  Just thinking about some of the problems
you have reported in the past (13 seconds to execute a recovery copy per
node), it could be possible a retransmission of the commit token from 5
to 1 (kicking off a new orf token) in the below example could occur
while in recovery.  That should be ok though, since retransmitted orf
tokens are discarded (totemsrp:3564).  I don't see any immediate way to
have a premature token transmission.

Lets examine 5 processors:

When in gather, a membership is built.  Once this membership is agreed
upon, a commit token is originated by the ring rep creating the first
transmission around the ring.

totemsrp.c:4044

Example with 5 processors:
1,2,3,4,5 in gather
1 originates commit token

Once a commit token is received in the gather state, the processor
enters commit (first token rotation)

totemsrp.c:4348

Example:
1 goes to commit
2 goes to commit
3 goes to commit
4 goes to commit
5 goes to commit

Once a commit token is received in the commit state, the processor
enters the recovery state only if this is a new commit token

totemsrp.c:4360

1 goes to recovery
2 goes to recovery
3 goes to recovery
4 goes to recovery
5 goes to recovery
1 originates orf token (this is the start of the third rotation)

totemsrp.c:4374

In the above failure free scenario, we see that we will never get a
premature orf token.

Perhaps the problem your running into is your mixing partition/merging
while forming a new ring.  This could result in my_id.addr[0] and ring
rep matching, but *old* commit tokens not being discarded by the ring
rep in the recovery state.

I believe the code would be better served by something like:

if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
    memb_commit_token->ring_id.seq == instance->my_ring_id_seq) {
...
}

Note it may be safer to compare the entire ring_id rather then only the seq.

This will cause any commit token not from the current agreed processors
to be rejected.  The ring seq increases by 4 for each new ring and it is
not possible to have a matching seq + rep in two separate rings (this is
why we store the ring seq to stable storage).

Could you test that or tell me your test case or both?

Thanks
-steve

>   range = orf_token->seq - instance->my_aru;
>   assert (range < QUEUE_RTR_ITEMS_SIZE_MAX);
>   ...
> 
> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx>
> ---
>  exec/totemsrp.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/exec/totemsrp.c b/exec/totemsrp.c
> index 5a78962..22717ac 100644
> --- a/exec/totemsrp.c
> +++ b/exec/totemsrp.c
> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token (
>  			break;
>  
>  		case MEMB_STATE_RECOVERY:
> -			if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) {
> +			if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
> +				memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) {
>  				log_printf (instance->totemsrp_log_level_debug,
>  					"Sending initial ORF token\n");
>  

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss