Re: [PATCH] Ignore the commit_token retransmitted by upstream node on the first rotation

Steven Dake <sdake@xxxxxxxxxx> · Sun, 11 Dec 2011 08:12:39 -0700

On 12/11/2011 12:29 AM, Steven Dake wrote:
> On 12/10/2011 10:43 AM, Yunkai Zhang wrote:
>> On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote:
>>> On 12/10/2011 02:27 AM, Yunkai Zhang wrote:
>>>> After the representative of the new ring entered into RECOVERY state,
>>>> it was waiting for the commit_token on the second rotation. But when
>>>> it got a ncommit_token, it didn't check the status of this token
>>>> which could be retransmitted by upstream node on the first rotation,
>>>> and then the representative node would send the initial ORF token
>>>> prematurely.
>>>>
>>>> If other nodes which have not entered into RECOVERY state received this
>>>> incorrect initial orf_token(which seq is 0), these nodes may crash
>>>> caused by assert instraction in *orf_token_rtr* function:
>>>>   ...
>>>
>>> Yunkai,
>>>
>>> First off thanks for digging into Totem.  I know the protocol is hard to
>>> understand and its nice to have more people to bounce ideas off of.
>>>
>>> Do you have a test case for this?
>>>
>>> Patch is a clever solution but the root cause seems incorrect.  I'd like
>>> to get to the RCA of your bug.  Just thinking about some of the problems
>>> you have reported in the past (13 seconds to execute a recovery copy per
>>> node), it could be possible a retransmission of the commit token from 5
>>> to 1 (kicking off a new orf token) in the below example could occur
>>> while in recovery.  That should be ok though, since retransmitted orf
>>> tokens are discarded (totemsrp:3564).  I don't see any immediate way to
>>> have a premature token transmission.
>>>
>>> Lets examine 5 processors:
>>>
>>> When in gather, a membership is built.  Once this membership is agreed
>>> upon, a commit token is originated by the ring rep creating the first
>>> transmission around the ring.
>>>
>>> totemsrp.c:4044
>>>
>>> Example with 5 processors:
>>> 1,2,3,4,5 in gather
>>> 1 originates commit token
>>>
>>>
>>> Once a commit token is received in the gather state, the processor
>>> enters commit (first token rotation)
>>>
>>> totemsrp.c:4348
>>>
>>> Example:
>>> 1 goes to commit
>>> 2 goes to commit
>>> 3 goes to commit
>>> 4 goes to commit
>>> 5 goes to commit
>>>
>>> Once a commit token is received in the commit state, the processor
>>> enters the recovery state only if this is a new commit token
>>>
>>> totemsrp.c:4360
>>>
>>> 1 goes to recovery
>>> 2 goes to recovery
>>> 3 goes to recovery
>>> 4 goes to recovery
>>> 5 goes to recovery
>>> 1 originates orf token (this is the start of the third rotation)
>>>
>>> totemsrp.c:4374
>>>
>>>
>>> In the above failure free scenario, we see that we will never get a
>>> premature orf token.
>>>
>>> Perhaps the problem your running into is your mixing partition/merging
>>> while forming a new ring.  This could result in my_id.addr[0] and ring
>>> rep matching, but *old* commit tokens not being discarded by the ring
>>> rep in the recovery state.
>>>
>>> I believe the code would be better served by something like:
>>>
>>> if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
>>>    memb_commit_token->ring_id.seq == instance->my_ring_id_seq) {
>>> ...
>>> }
>>>
>>> Note it may be safer to compare the entire ring_id rather then only the seq.
>>>
>>> This will cause any commit token not from the current agreed processors
>>> to be rejected.  The ring seq increases by 4 for each new ring and it is
>>> not possible to have a matching seq + rep in two separate rings (this is
>>> why we store the ring seq to stable storage).
>>>
>>> Could you test that or tell me your test case or both?
>>>
>>
>> Let me discuss my test case based on your example mentioned above:
>>
>> 1,2,3,4,5 in gather
>>
>> 1 originates commit token
>>   => enter commit state
>>     => send commit_token(token_seq:0) to 2
>>       => receive commit_token(token_seq:4) from 5
>>         => enter RECOVERY state
>>           => receive commit_token(token_seq:4) retransmitted by 5
>>             => enter OPERATIONAL state *prematurely*!!
> 
> operational is triggered by totemsrp.c:4540 atomically once all nodes
> have recovered all of the messages.  Are you sure the commit token is
> what is received here which is triggering operational?
> 
>>               => send initial ORF toke to 2 *prematurely*!!
> 
> agree an extra ORF token will be sent here each time the commit token is
> retransmitted
> 
>>
>> 2 receive commit_token(token_seq:0) from 1
>>   =>enter commit state
>>     => send commit_token(token_seq:1) to 3
>>       => commit token retransmit timeout occur!!
>>         => retransmit commit_token(token_seq:1) ...
>>         => retransmit commit_token(token_seq:1) ...
>>         => ...
>>           => receive JoinMSG from 3
>>             => enter GATHER state, ...
>>               => receive initial ORF token from 1
>>                 => crash in *orf_token_rtr* funcation!!
> 
> This points out the issue is that the orf token should be rejected at
> totemsrp.c:3565.
> 
> After entering commit:
> totemsrp.c:4356
> totemsrp.c:2003 sets the new ring id (which was incremented)
> receive join message from 3
> enter gather state
> 

Note I tracked down a bugzilla in March related to a change which
removed the ring restore.  See the rationale and reproducer here:

https://bugzilla.redhat.com/show_bug.cgi?id=623176

Comment #6

> In this case, totem goes from commit->gather, On commit->gather, in some
> way the ring id should be invalidated.  At one point totem used to set
> and restore the ring id, and for some reason this was removed.  I don't
> recall the rationale but it should be in the changelogs.  Bit tired now,
> may have a look tomorrow.
> 
> As I recall it only did this in the recovery state, but given your
> scenario it should also be done for the commit state.
> 
> see totemsrp.c:1609
> 
> Do you have a test case that consistently reproduces this issue?
> 
> Regards
> -steve
> 
> 
>>
>> 3 receive commit_token(token_seq:1) from 2
>>   => enter commit state
>>     => send commit_token(token_seq:2) to 4
>>       => commit token retransmit timeout occur!!
>>         => retransmit commit_token(token_seq:2) ...
>>         => retransmit commit_token(token_seq:2) ...
>>         => ...
>>           => commit token lost timeout occur!!
>>             => enter GATHER state, multicast JoinMSG ...
>>
>> 4 receive commit_token(token_seq:2) from 3
>>   => enter commit state
>>     => send commit_token(token_seq:3) to 5
>>       => commit token retransmit timeout occur!!
>>         => retransmit commit_token(token_seq:3) to 5...
>>         => retransmit commit_token(token_seq:3) to 5...
>>         => ...
>>
>> 5 receive commit_token(token_seq:3) from 4
>>   => enter commit state
>>     => send commit_token(token_seq:4) to 1
>>       => commit token retransmit timeout occur!!
>>         => retransmit commit_token(token_seq:4) to 1...
>>         => retransmit commit_token(token_seq:4) to 1...
>>         => ...
>>
>> The key point in my test cast is that one of processors(3 in this
>> case) runs into commit token timeout, and 1 receive commit token
>> retransmitted by 5, and enter OPERATIONAL state *prematurely*.
>>
>>> Thanks
>>> -steve
>>>
>>>
>>>>   range = orf_token->seq - instance->my_aru;
>>>>   assert (range < QUEUE_RTR_ITEMS_SIZE_MAX);
>>>>   ...
>>>>
>>>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx>
>>>> ---
>>>>  exec/totemsrp.c |    3 ++-
>>>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>>>
>>>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c
>>>> index 5a78962..22717ac 100644
>>>> --- a/exec/totemsrp.c
>>>> +++ b/exec/totemsrp.c
>>>> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token (
>>>>                       break;
>>>>
>>>>               case MEMB_STATE_RECOVERY:
>>>> -                     if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) {
>>>> +                     if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
>>>> +                             memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) {
>>>>                               log_printf (instance->totemsrp_log_level_debug,
>>>>                                       "Sending initial ORF token\n");
>>>>
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss@xxxxxxxxxxxx
>>> http://lists.corosync.org/mailman/listinfo/discuss
>>
>>
>>
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss