On Mon, Dec 12, 2011 at 2:16 AM, Steven Dake <sdake@xxxxxxxxxx> wrote: > On 12/10/2011 10:43 AM, Yunkai Zhang wrote: >> On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote: >>> On 12/10/2011 02:27 AM, Yunkai Zhang wrote: >>>> After the representative of the new ring entered into RECOVERY state, >>>> it was waiting for the commit_token on the second rotation. But when >>>> it got a ncommit_token, it didn't check the status of this token >>>> which could be retransmitted by upstream node on the first rotation, >>>> and then the representative node would send the initial ORF token >>>> prematurely. >>>> >>>> If other nodes which have not entered into RECOVERY state received this >>>> incorrect initial orf_token(which seq is 0), these nodes may crash >>>> caused by assert instraction in *orf_token_rtr* function: >>>> ... >>> >>> Yunkai, >>> >>> First off thanks for digging into Totem. I know the protocol is hard to >>> understand and its nice to have more people to bounce ideas off of. >>> >>> Do you have a test case for this? >>> >>> Patch is a clever solution but the root cause seems incorrect. I'd like >>> to get to the RCA of your bug. Just thinking about some of the problems >>> you have reported in the past (13 seconds to execute a recovery copy per >>> node), it could be possible a retransmission of the commit token from 5 >>> to 1 (kicking off a new orf token) in the below example could occur >>> while in recovery. That should be ok though, since retransmitted orf >>> tokens are discarded (totemsrp:3564). I don't see any immediate way to >>> have a premature token transmission. >>> >>> Lets examine 5 processors: >>> >>> When in gather, a membership is built. Once this membership is agreed >>> upon, a commit token is originated by the ring rep creating the first >>> transmission around the ring. >>> >>> totemsrp.c:4044 >>> >>> Example with 5 processors: >>> 1,2,3,4,5 in gather >>> 1 originates commit token >>> >>> >>> Once a commit token is received in the gather state, the processor >>> enters commit (first token rotation) >>> >>> totemsrp.c:4348 >>> >>> Example: >>> 1 goes to commit >>> 2 goes to commit >>> 3 goes to commit >>> 4 goes to commit >>> 5 goes to commit >>> >>> Once a commit token is received in the commit state, the processor >>> enters the recovery state only if this is a new commit token >>> >>> totemsrp.c:4360 >>> >>> 1 goes to recovery >>> 2 goes to recovery >>> 3 goes to recovery >>> 4 goes to recovery >>> 5 goes to recovery >>> 1 originates orf token (this is the start of the third rotation) >>> >>> totemsrp.c:4374 >>> >>> >>> In the above failure free scenario, we see that we will never get a >>> premature orf token. >>> >>> Perhaps the problem your running into is your mixing partition/merging >>> while forming a new ring. This could result in my_id.addr[0] and ring >>> rep matching, but *old* commit tokens not being discarded by the ring >>> rep in the recovery state. >>> >>> I believe the code would be better served by something like: >>> >>> if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >>> memb_commit_token->ring_id.seq == instance->my_ring_id_seq) { >>> ... >>> } >>> >>> Note it may be safer to compare the entire ring_id rather then only the seq. >>> >>> This will cause any commit token not from the current agreed processors >>> to be rejected. The ring seq increases by 4 for each new ring and it is >>> not possible to have a matching seq + rep in two separate rings (this is >>> why we store the ring seq to stable storage). >>> >>> Could you test that or tell me your test case or both? >>> >> >> Let me discuss my test case based on your example mentioned above: >> >> 1,2,3,4,5 in gather >> >> 1 originates commit token >> => enter commit state >> => send commit_token(token_seq:0) to 2 >> => receive commit_token(token_seq:4) from 5 >> => enter RECOVERY state >> => receive commit_token(token_seq:4) retransmitted by 5 >> => enter OPERATIONAL state *prematurely*!! >> => send initial ORF toke to 2 *prematurely*!! >> >> 2 receive commit_token(token_seq:0) from 1 >> =>enter commit state >> => send commit_token(token_seq:1) to 3 >> => commit token retransmit timeout occur!! >> => retransmit commit_token(token_seq:1) ... >> => retransmit commit_token(token_seq:1) ... >> => ... >> => receive JoinMSG from 3 >> => enter GATHER state, ... >> => receive initial ORF token from 1 >> => crash in *orf_token_rtr* funcation!! Today, I have checked my logging file carefully again, this description did not incorrectly: 2 receive commit_token(token_seq:0) from 1 =>enter commit state => send commit_token(token_seq:1) to 3 => receive commit_token(token_seq:6) from 1 => enter RECOVERY state, **set orf_token_discard=0** => commit token retransmit timeout occur!! => retransmit commit_token(token_seq:1) ... => retransmit commit_token(token_seq:1) ... => ... => receive JoinMSG from 3 in RECOVERY state => call memb_recovery_state_token_loss => old_ring_state_restore, set instance->my_aru=39 => memb_state_gather_enter (instance, 5), **set orf_token_discard=1** => enter GATHER state => call memb_state_gather_enter (instance, 14), **set orf_token_discard=1** => enter GATHER state => receive initial ORF token from 1 => crash in *orf_token_rtr* funcation!! I can confirmed that the value of instance->memb_state was MEMB_STATE_GATHER(=2) when 2 received initial ORF token. I use following code for logging in *message_handler_orf_token* function: =code= log_printf (LOG_DEBUG, "recv orf_token from:%s, token_seq:%d, my_token_seq:%d, aru:%x, my_aru:%x, seq:%x, state:%d", nodeid_to_ip(token->header.nodeid, buf), token->token_seq, instance->my_token_seq, token->aru, instance->my_aru, token->seq, instance->memb_state); =log output= Dec 08 23:02:03 corosync [TOTEM ] totemsrp.c:3544 recv orf_token from:10.232.134.1, token_seq:0, my_token_seq:-1, aru:ffffffff, my_aru:39, seq:0, state:2 It was the last log record while corosync exist. But I can't believed that corosync did not discard the ORF token when instance->memb_state equals to MEMB_STATE_GATHER (as long as corosync enter into GATHER state, orf_token_discard would be set to be 1). A reasonable guess is that the value of orf_token_discard was cleared by unknown reason -- maybe there exist memory corruption. >> >> 3 receive commit_token(token_seq:1) from 2 >> => enter commit state >> => send commit_token(token_seq:2) to 4 >> => commit token retransmit timeout occur!! >> => retransmit commit_token(token_seq:2) ... >> => retransmit commit_token(token_seq:2) ... >> => ... >> => commit token lost timeout occur!! >> => enter GATHER state, multicast JoinMSG ... >> >> 4 receive commit_token(token_seq:2) from 3 >> => enter commit state >> => send commit_token(token_seq:3) to 5 >> => commit token retransmit timeout occur!! >> => retransmit commit_token(token_seq:3) to 5... >> => retransmit commit_token(token_seq:3) to 5... >> => ... >> >> 5 receive commit_token(token_seq:3) from 4 >> => enter commit state >> => send commit_token(token_seq:4) to 1 >> => commit token retransmit timeout occur!! >> => retransmit commit_token(token_seq:4) to 1... >> => retransmit commit_token(token_seq:4) to 1... >> => ... >> >> The key point in my test cast is that one of processors(3 in this >> case) runs into commit token timeout, and 1 receive commit token >> retransmitted by 5, and enter OPERATIONAL state *prematurely*. >> t > Yunkai, > > I took a look at the code with a fresh set of eyes this morning for 3 > hours and don't see how it is possible for processor 2 to crash in > orf_token_rtr with the above scenario. The orf token will be discarded > in all cases in GATHER and COMMIT before executing orf_token_rtr. > orf_token_rtr should *never* run in GATHER or COMMIT states of the > processor. > > (flatiron-1.4 line numbers): > > On startup: > totemsrp.c:657 - memb_state = OPERATIONAL > totemsrp.c:4467 - when network interface detected call gather_enter > state = gather, orf_token_discard = 1 > totemsrp.c:1870 - gather_enter sets orf_token_discard to 1 > > we can proceed to enter OPERATIONAL here or go directly to COMMIT > > COMMIT path: > totemsrp.c:4331 - COMMIT token received in GATHER state > (orf_token_discard still equals 1) > totemsrp.c:4332 - join message received in COMMIT state - enter GATHER > (orf_token_discard still equals 1) > totemsrp.c:3428 - orf token discarded (because orf_token_discard still > equals 1) > > The only code that resets orf_token_discard to 0 is > totemsrp.c:2008 - entered recovery state > > Analyzing the path where totem transitions from RECOVERY -> GATHER -> > COMMIT: > > totemsrp.c:4276 - memb_state_gather_enter called > totemsrp.c:1870 - gather enter sets orf_token_discard to 1 > totemsrp.c:4332 - join message received in COMMIT state - enter GATHER > (orf_token_discard still equals 1) > totemsrp.c:3428 - orf token discarded (because orf_token_discard still > equals 1) > > Given that the orf token would always be discarded by processor 2 in the > above test case you gave, I don't see a scenario in the code where > orf_token_rtr could crash in that test case. If you can show me how > this happens, we could fix this up. One possibility is some sort of > thread race. Could you determine that orf_token_discard is equal to 0 > when your failure occurs? > > As for entering operational prematurely: > If this is a problem, it is a separate problem from the above analysis. > Lets get the above analysis fixed if it is a problem. > > Really looking forward to hearing back from you so we can fix the > problem you see. > > Regards > -steve > >>> Thanks >>> -steve >>> >>> >>>> range = orf_token->seq - instance->my_aru; >>>> assert (range < QUEUE_RTR_ITEMS_SIZE_MAX); >>>> ... >>>> >>>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> >>>> --- >>>> exec/totemsrp.c | 3 ++- >>>> 1 files changed, 2 insertions(+), 1 deletions(-) >>>> >>>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c >>>> index 5a78962..22717ac 100644 >>>> --- a/exec/totemsrp.c >>>> +++ b/exec/totemsrp.c >>>> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token ( >>>> break; >>>> >>>> case MEMB_STATE_RECOVERY: >>>> - if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) { >>>> + if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >>>> + memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) { >>>> log_printf (instance->totemsrp_log_level_debug, >>>> "Sending initial ORF token\n"); >>>> >>> >>> _______________________________________________ >>> discuss mailing list >>> discuss@xxxxxxxxxxxx >>> http://lists.corosync.org/mailman/listinfo/discuss >> >> >> > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss -- Yunkai Zhang Work at Taobao _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss