Re: [PATCH] Ignore the commit_token retransmitted by upstream node on the first rotation

Steven Dake <sdake@xxxxxxxxxx> · Sun, 11 Dec 2011 00:29:50 -0700

On 12/10/2011 10:43 AM, Yunkai Zhang wrote:
> On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote:
>> On 12/10/2011 02:27 AM, Yunkai Zhang wrote:
>>> After the representative of the new ring entered into RECOVERY state,
>>> it was waiting for the commit_token on the second rotation. But when
>>> it got a ncommit_token, it didn't check the status of this token
>>> which could be retransmitted by upstream node on the first rotation,
>>> and then the representative node would send the initial ORF token
>>> prematurely.
>>>
>>> If other nodes which have not entered into RECOVERY state received this
>>> incorrect initial orf_token(which seq is 0), these nodes may crash
>>> caused by assert instraction in *orf_token_rtr* function:
>>>   ...
>>
>> Yunkai,
>>
>> First off thanks for digging into Totem.  I know the protocol is hard to
>> understand and its nice to have more people to bounce ideas off of.
>>
>> Do you have a test case for this?
>>
>> Patch is a clever solution but the root cause seems incorrect.  I'd like
>> to get to the RCA of your bug.  Just thinking about some of the problems
>> you have reported in the past (13 seconds to execute a recovery copy per
>> node), it could be possible a retransmission of the commit token from 5
>> to 1 (kicking off a new orf token) in the below example could occur
>> while in recovery.  That should be ok though, since retransmitted orf
>> tokens are discarded (totemsrp:3564).  I don't see any immediate way to
>> have a premature token transmission.
>>
>> Lets examine 5 processors:
>>
>> When in gather, a membership is built.  Once this membership is agreed
>> upon, a commit token is originated by the ring rep creating the first
>> transmission around the ring.
>>
>> totemsrp.c:4044
>>
>> Example with 5 processors:
>> 1,2,3,4,5 in gather
>> 1 originates commit token
>>
>>
>> Once a commit token is received in the gather state, the processor
>> enters commit (first token rotation)
>>
>> totemsrp.c:4348
>>
>> Example:
>> 1 goes to commit
>> 2 goes to commit
>> 3 goes to commit
>> 4 goes to commit
>> 5 goes to commit
>>
>> Once a commit token is received in the commit state, the processor
>> enters the recovery state only if this is a new commit token
>>
>> totemsrp.c:4360
>>
>> 1 goes to recovery
>> 2 goes to recovery
>> 3 goes to recovery
>> 4 goes to recovery
>> 5 goes to recovery
>> 1 originates orf token (this is the start of the third rotation)
>>
>> totemsrp.c:4374
>>
>>
>> In the above failure free scenario, we see that we will never get a
>> premature orf token.
>>
>> Perhaps the problem your running into is your mixing partition/merging
>> while forming a new ring.  This could result in my_id.addr[0] and ring
>> rep matching, but *old* commit tokens not being discarded by the ring
>> rep in the recovery state.
>>
>> I believe the code would be better served by something like:
>>
>> if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
>>    memb_commit_token->ring_id.seq == instance->my_ring_id_seq) {
>> ...
>> }
>>
>> Note it may be safer to compare the entire ring_id rather then only the seq.
>>
>> This will cause any commit token not from the current agreed processors
>> to be rejected.  The ring seq increases by 4 for each new ring and it is
>> not possible to have a matching seq + rep in two separate rings (this is
>> why we store the ring seq to stable storage).
>>
>> Could you test that or tell me your test case or both?
>>
> 
> Let me discuss my test case based on your example mentioned above:
> 
> 1,2,3,4,5 in gather
> 
> 1 originates commit token
>   => enter commit state
>     => send commit_token(token_seq:0) to 2
>       => receive commit_token(token_seq:4) from 5
>         => enter RECOVERY state
>           => receive commit_token(token_seq:4) retransmitted by 5
>             => enter OPERATIONAL state *prematurely*!!

operational is triggered by totemsrp.c:4540 atomically once all nodes
have recovered all of the messages.  Are you sure the commit token is
what is received here which is triggering operational?

>               => send initial ORF toke to 2 *prematurely*!!

agree an extra ORF token will be sent here each time the commit token is
retransmitted

> 
> 2 receive commit_token(token_seq:0) from 1
>   =>enter commit state
>     => send commit_token(token_seq:1) to 3
>       => commit token retransmit timeout occur!!
>         => retransmit commit_token(token_seq:1) ...
>         => retransmit commit_token(token_seq:1) ...
>         => ...
>           => receive JoinMSG from 3
>             => enter GATHER state, ...
>               => receive initial ORF token from 1
>                 => crash in *orf_token_rtr* funcation!!

This points out the issue is that the orf token should be rejected at
totemsrp.c:3565.

After entering commit:
totemsrp.c:4356
totemsrp.c:2003 sets the new ring id (which was incremented)
receive join message from 3
enter gather state

In this case, totem goes from commit->gather, On commit->gather, in some
way the ring id should be invalidated.  At one point totem used to set
and restore the ring id, and for some reason this was removed.  I don't
recall the rationale but it should be in the changelogs.  Bit tired now,
may have a look tomorrow.

As I recall it only did this in the recovery state, but given your
scenario it should also be done for the commit state.

see totemsrp.c:1609

Do you have a test case that consistently reproduces this issue?

Regards
-steve

> 
> 3 receive commit_token(token_seq:1) from 2
>   => enter commit state
>     => send commit_token(token_seq:2) to 4
>       => commit token retransmit timeout occur!!
>         => retransmit commit_token(token_seq:2) ...
>         => retransmit commit_token(token_seq:2) ...
>         => ...
>           => commit token lost timeout occur!!
>             => enter GATHER state, multicast JoinMSG ...
> 
> 4 receive commit_token(token_seq:2) from 3
>   => enter commit state
>     => send commit_token(token_seq:3) to 5
>       => commit token retransmit timeout occur!!
>         => retransmit commit_token(token_seq:3) to 5...
>         => retransmit commit_token(token_seq:3) to 5...
>         => ...
> 
> 5 receive commit_token(token_seq:3) from 4
>   => enter commit state
>     => send commit_token(token_seq:4) to 1
>       => commit token retransmit timeout occur!!
>         => retransmit commit_token(token_seq:4) to 1...
>         => retransmit commit_token(token_seq:4) to 1...
>         => ...
> 
> The key point in my test cast is that one of processors(3 in this
> case) runs into commit token timeout, and 1 receive commit token
> retransmitted by 5, and enter OPERATIONAL state *prematurely*.
> 
>> Thanks
>> -steve
>>
>>
>>>   range = orf_token->seq - instance->my_aru;
>>>   assert (range < QUEUE_RTR_ITEMS_SIZE_MAX);
>>>   ...
>>>
>>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx>
>>> ---
>>>  exec/totemsrp.c |    3 ++-
>>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c
>>> index 5a78962..22717ac 100644
>>> --- a/exec/totemsrp.c
>>> +++ b/exec/totemsrp.c
>>> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token (
>>>                       break;
>>>
>>>               case MEMB_STATE_RECOVERY:
>>> -                     if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) {
>>> +                     if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
>>> +                             memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) {
>>>                               log_printf (instance->totemsrp_log_level_debug,
>>>                                       "Sending initial ORF token\n");
>>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx
>> http://lists.corosync.org/mailman/listinfo/discuss
> 
> 
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss