Re: [PATCH] Ignore the commit_token retransmitted by upstream node on the first rotation

Steven Dake <sdake@xxxxxxxxxx> · Sun, 11 Dec 2011 11:16:18 -0700

On 12/10/2011 10:43 AM, Yunkai Zhang wrote:
> On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote:
>> On 12/10/2011 02:27 AM, Yunkai Zhang wrote:
>>> After the representative of the new ring entered into RECOVERY state,
>>> it was waiting for the commit_token on the second rotation. But when
>>> it got a ncommit_token, it didn't check the status of this token
>>> which could be retransmitted by upstream node on the first rotation,
>>> and then the representative node would send the initial ORF token
>>> prematurely.
>>>
>>> If other nodes which have not entered into RECOVERY state received this
>>> incorrect initial orf_token(which seq is 0), these nodes may crash
>>> caused by assert instraction in *orf_token_rtr* function:
>>>   ...
>>
>> Yunkai,
>>
>> First off thanks for digging into Totem.  I know the protocol is hard to
>> understand and its nice to have more people to bounce ideas off of.
>>
>> Do you have a test case for this?
>>
>> Patch is a clever solution but the root cause seems incorrect.  I'd like
>> to get to the RCA of your bug.  Just thinking about some of the problems
>> you have reported in the past (13 seconds to execute a recovery copy per
>> node), it could be possible a retransmission of the commit token from 5
>> to 1 (kicking off a new orf token) in the below example could occur
>> while in recovery.  That should be ok though, since retransmitted orf
>> tokens are discarded (totemsrp:3564).  I don't see any immediate way to
>> have a premature token transmission.
>>
>> Lets examine 5 processors:
>>
>> When in gather, a membership is built.  Once this membership is agreed
>> upon, a commit token is originated by the ring rep creating the first
>> transmission around the ring.
>>
>> totemsrp.c:4044
>>
>> Example with 5 processors:
>> 1,2,3,4,5 in gather
>> 1 originates commit token
>>
>>
>> Once a commit token is received in the gather state, the processor
>> enters commit (first token rotation)
>>
>> totemsrp.c:4348
>>
>> Example:
>> 1 goes to commit
>> 2 goes to commit
>> 3 goes to commit
>> 4 goes to commit
>> 5 goes to commit
>>
>> Once a commit token is received in the commit state, the processor
>> enters the recovery state only if this is a new commit token
>>
>> totemsrp.c:4360
>>
>> 1 goes to recovery
>> 2 goes to recovery
>> 3 goes to recovery
>> 4 goes to recovery
>> 5 goes to recovery
>> 1 originates orf token (this is the start of the third rotation)
>>
>> totemsrp.c:4374
>>
>>
>> In the above failure free scenario, we see that we will never get a
>> premature orf token.
>>
>> Perhaps the problem your running into is your mixing partition/merging
>> while forming a new ring.  This could result in my_id.addr[0] and ring
>> rep matching, but *old* commit tokens not being discarded by the ring
>> rep in the recovery state.
>>
>> I believe the code would be better served by something like:
>>
>> if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
>>    memb_commit_token->ring_id.seq == instance->my_ring_id_seq) {
>> ...
>> }
>>
>> Note it may be safer to compare the entire ring_id rather then only the seq.
>>
>> This will cause any commit token not from the current agreed processors
>> to be rejected.  The ring seq increases by 4 for each new ring and it is
>> not possible to have a matching seq + rep in two separate rings (this is
>> why we store the ring seq to stable storage).
>>
>> Could you test that or tell me your test case or both?
>>
> 
> Let me discuss my test case based on your example mentioned above:
> 
> 1,2,3,4,5 in gather
> 
> 1 originates commit token
>   => enter commit state
>     => send commit_token(token_seq:0) to 2
>       => receive commit_token(token_seq:4) from 5
>         => enter RECOVERY state
>           => receive commit_token(token_seq:4) retransmitted by 5
>             => enter OPERATIONAL state *prematurely*!!
>               => send initial ORF toke to 2 *prematurely*!!
> 
> 2 receive commit_token(token_seq:0) from 1
>   =>enter commit state
>     => send commit_token(token_seq:1) to 3
>       => commit token retransmit timeout occur!!
>         => retransmit commit_token(token_seq:1) ...
>         => retransmit commit_token(token_seq:1) ...
>         => ...
>           => receive JoinMSG from 3
>             => enter GATHER state, ...
>               => receive initial ORF token from 1
>                 => crash in *orf_token_rtr* funcation!!
> 
> 3 receive commit_token(token_seq:1) from 2
>   => enter commit state
>     => send commit_token(token_seq:2) to 4
>       => commit token retransmit timeout occur!!
>         => retransmit commit_token(token_seq:2) ...
>         => retransmit commit_token(token_seq:2) ...
>         => ...
>           => commit token lost timeout occur!!
>             => enter GATHER state, multicast JoinMSG ...
> 
> 4 receive commit_token(token_seq:2) from 3
>   => enter commit state
>     => send commit_token(token_seq:3) to 5
>       => commit token retransmit timeout occur!!
>         => retransmit commit_token(token_seq:3) to 5...
>         => retransmit commit_token(token_seq:3) to 5...
>         => ...
> 
> 5 receive commit_token(token_seq:3) from 4
>   => enter commit state
>     => send commit_token(token_seq:4) to 1
>       => commit token retransmit timeout occur!!
>         => retransmit commit_token(token_seq:4) to 1...
>         => retransmit commit_token(token_seq:4) to 1...
>         => ...
> 
> The key point in my test cast is that one of processors(3 in this
> case) runs into commit token timeout, and 1 receive commit token
> retransmitted by 5, and enter OPERATIONAL state *prematurely*.
> t
Yunkai,

I took a look at the code with a fresh set of eyes this morning for 3
hours and don't see how it is possible for processor 2 to crash in
orf_token_rtr with the above scenario.  The orf token will be discarded
in all cases in GATHER and COMMIT before executing orf_token_rtr.
orf_token_rtr should *never* run in GATHER or COMMIT states of the
processor.

(flatiron-1.4 line numbers):

On startup:
totemsrp.c:657 	- memb_state = OPERATIONAL
totemsrp.c:4467 - when network interface detected call gather_enter
	state = gather, orf_token_discard = 1
totemsrp.c:1870 - gather_enter sets orf_token_discard to 1

we can proceed to enter OPERATIONAL here or go directly to COMMIT

COMMIT path:
totemsrp.c:4331 - COMMIT token received in GATHER state
(orf_token_discard still equals 1)
totemsrp.c:4332 - join message received in COMMIT state - enter GATHER
(orf_token_discard still equals 1)
totemsrp.c:3428 - orf token discarded (because orf_token_discard still
equals 1)

The only code that resets orf_token_discard to 0 is
totemsrp.c:2008 - entered recovery state

Analyzing the path where totem transitions from RECOVERY -> GATHER ->
COMMIT:

totemsrp.c:4276 - memb_state_gather_enter called
totemsrp.c:1870 - gather enter sets orf_token_discard to 1
totemsrp.c:4332 - join message received in COMMIT state - enter GATHER
(orf_token_discard still equals 1)
totemsrp.c:3428 - orf token discarded (because orf_token_discard still
equals 1)

Given that the orf token would always be discarded by processor 2 in the
above test case you gave, I don't see a scenario in the code where
orf_token_rtr could crash in that test case.  If you can show me how
this happens, we could fix this up.  One possibility is some sort of
thread race.  Could you determine that orf_token_discard is equal to 0
when your failure occurs?

As for entering operational prematurely:
If this is a problem, it is a separate problem from the above analysis.
 Lets get the above analysis fixed if it is a problem.

Really looking forward to hearing back from you so we can fix the
problem you see.

Regards
-steve

>> Thanks
>> -steve
>>
>>
>>>   range = orf_token->seq - instance->my_aru;
>>>   assert (range < QUEUE_RTR_ITEMS_SIZE_MAX);
>>>   ...
>>>
>>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx>
>>> ---
>>>  exec/totemsrp.c |    3 ++-
>>>  1 files changed, 2 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c
>>> index 5a78962..22717ac 100644
>>> --- a/exec/totemsrp.c
>>> +++ b/exec/totemsrp.c
>>> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token (
>>>                       break;
>>>
>>>               case MEMB_STATE_RECOVERY:
>>> -                     if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) {
>>> +                     if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) &&
>>> +                             memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) {
>>>                               log_printf (instance->totemsrp_log_level_debug,
>>>                                       "Sending initial ORF token\n");
>>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx
>> http://lists.corosync.org/mailman/listinfo/discuss
> 
> 
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss