Re: [PATCH] Ignore the commit_token retransmitted by upstream node on the first rotation

Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> · Sun, 11 Dec 2011 01:43:35 +0800

On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote:
> On 12/10/2011 02:27 AM, Yunkai Zhang wrote:
>> After the representative of the new ring entered into RECOVERY state,
>> it was waiting for the commit_token on the second rotation. But when
>> it got a ncommit_token, it didn't check the status of this token
>> which could be retransmitted by upstream node on the first rotation,
>> and then the representative node would send the initial ORF token
>> prematurely.
>>
>> If other nodes which have not entered into RECOVERY state received this
>> incorrect initial orf_token(which seq is 0), these nodes may crash
>> caused by assert instraction in *orf_token_rtr* function:
>>   ...
>
> Yunkai,
>
> First off thanks for digging into Totem.  I know the protocol is hard to
> understand and its nice to have more people to bounce ideas off of.
>
> Do you have a test case for this?
>
> Patch is a clever solution but the root cause seems incorrect.  I'd like
> to get to the RCA of your bug.  Just thinking about some of the problems
> you have reported in the past (13 seconds to execute a recovery copy per
> node), it could be possible a retransmission of the commit token from 5
> to 1 (kicking off a new orf token) in the below example could occur
> while in recovery.  That should be ok though, since retransmitted orf
> tokens are discarded (totemsrp:3564).  I don't see any immediate way to
> have a premature token transmission.
>
> Lets examine 5 processors:
>
> When in gather, a membership is built.  Once this membership is agreed
> upon, a commit token is originated by the ring rep creating the first
> transmission around the ring.
>
> totemsrp.c:4044
>
> Example with 5 processors:
> 1,2,3,4,5 in gather
> 1 originates commit token
>
>
> Once a commit token is received in the gather state, the processor
> enters commit (first token rotation)
>
> totemsrp.c:4348
>
> Example:
> 1 goes to commit
> 2 goes to commit
> 3 goes to commit
> 4 goes to commit
> 5 goes to commit
>
> Once a commit token is received in the commit state, the processor
> enters the recovery state only if this is a new commit token
>
> totemsrp.c:4360
>
> 1 goes to recovery
> 2 goes to recovery
> 3 goes to recovery
> 4 goes to recovery
> 5 goes to recovery
> 1 originates orf token (this is the start of the third rotation)
>
> totemsrp.c:4374
>
>
> In the above failure free scenario, we see that we will never get a
> premature orf token.
>
> Perhaps the problem your running into is your mixing partition/merging
> while forming a new ring.  This could result in my_id.addr[0] and ring
> rep matching, but *old* commit tokens not being discarded by the ring
> rep in the recovery state.
>
> I believe the code would be better served by something like:
>
> if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
>    memb_commit_token->ring_id.seq == instance->my_ring_id_seq) {
> ...
> }
>
> Note it may be safer to compare the entire ring_id rather then only the seq.
>
> This will cause any commit token not from the current agreed processors
> to be rejected.  The ring seq increases by 4 for each new ring and it is
> not possible to have a matching seq + rep in two separate rings (this is
> why we store the ring seq to stable storage).
>
> Could you test that or tell me your test case or both?
>

Let me discuss my test case based on your example mentioned above:

1,2,3,4,5 in gather

1 originates commit token
  => enter commit state
    => send commit_token(token_seq:0) to 2
      => receive commit_token(token_seq:4) from 5
        => enter RECOVERY state
          => receive commit_token(token_seq:4) retransmitted by 5
            => enter OPERATIONAL state *prematurely*!!
              => send initial ORF toke to 2 *prematurely*!!

2 receive commit_token(token_seq:0) from 1
  =>enter commit state
    => send commit_token(token_seq:1) to 3
      => commit token retransmit timeout occur!!
        => retransmit commit_token(token_seq:1) ...
        => retransmit commit_token(token_seq:1) ...
        => ...
          => receive JoinMSG from 3
            => enter GATHER state, ...
              => receive initial ORF token from 1
                => crash in *orf_token_rtr* funcation!!

3 receive commit_token(token_seq:1) from 2
  => enter commit state
    => send commit_token(token_seq:2) to 4
      => commit token retransmit timeout occur!!
        => retransmit commit_token(token_seq:2) ...
        => retransmit commit_token(token_seq:2) ...
        => ...
          => commit token lost timeout occur!!
            => enter GATHER state, multicast JoinMSG ...

4 receive commit_token(token_seq:2) from 3
  => enter commit state
    => send commit_token(token_seq:3) to 5
      => commit token retransmit timeout occur!!
        => retransmit commit_token(token_seq:3) to 5...
        => retransmit commit_token(token_seq:3) to 5...
        => ...

5 receive commit_token(token_seq:3) from 4
  => enter commit state
    => send commit_token(token_seq:4) to 1
      => commit token retransmit timeout occur!!
        => retransmit commit_token(token_seq:4) to 1...
        => retransmit commit_token(token_seq:4) to 1...
        => ...

The key point in my test cast is that one of processors(3 in this
case) runs into commit token timeout, and 1 receive commit token
retransmitted by 5, and enter OPERATIONAL state *prematurely*.

> Thanks
> -steve
>
>
>>   range = orf_token->seq - instance->my_aru;
>>   assert (range < QUEUE_RTR_ITEMS_SIZE_MAX);
>>   ...
>>
>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx>
>> ---
>>  exec/totemsrp.c |    3 ++-
>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c
>> index 5a78962..22717ac 100644
>> --- a/exec/totemsrp.c
>> +++ b/exec/totemsrp.c
>> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token (
>>                       break;
>>
>>               case MEMB_STATE_RECOVERY:
>> -                     if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) {
>> +                     if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
>> +                             memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) {
>>                               log_printf (instance->totemsrp_log_level_debug,
>>                                       "Sending initial ORF token\n");
>>
>
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

-- 
Yunkai Zhang
Work at Taobao
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss