I will continue to track this issue. It seems that this patch can't prevent corosync crashing, but it can ignore retransmited commit_token correctly. I will update the description of this patch, and submit a patch v2 later. On Mon, Dec 12, 2011 at 5:13 PM, Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> wrote: > On Mon, Dec 12, 2011 at 2:16 AM, Steven Dake <sdake@xxxxxxxxxx> wrote: >> On 12/10/2011 10:43 AM, Yunkai Zhang wrote: >>> On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote: >>>> On 12/10/2011 02:27 AM, Yunkai Zhang wrote: >>>>> After the representative of the new ring entered into RECOVERY state, >>>>> it was waiting for the commit_token on the second rotation. But when >>>>> it got a ncommit_token, it didn't check the status of this token >>>>> which could be retransmitted by upstream node on the first rotation, >>>>> and then the representative node would send the initial ORF token >>>>> prematurely. >>>>> >>>>> If other nodes which have not entered into RECOVERY state received this >>>>> incorrect initial orf_token(which seq is 0), these nodes may crash >>>>> caused by assert instraction in *orf_token_rtr* function: >>>>> ... >>>> >>>> Yunkai, >>>> >>>> First off thanks for digging into Totem. I know the protocol is hard to >>>> understand and its nice to have more people to bounce ideas off of. >>>> >>>> Do you have a test case for this? >>>> >>>> Patch is a clever solution but the root cause seems incorrect. I'd like >>>> to get to the RCA of your bug. Just thinking about some of the problems >>>> you have reported in the past (13 seconds to execute a recovery copy per >>>> node), it could be possible a retransmission of the commit token from 5 >>>> to 1 (kicking off a new orf token) in the below example could occur >>>> while in recovery. That should be ok though, since retransmitted orf >>>> tokens are discarded (totemsrp:3564). I don't see any immediate way to >>>> have a premature token transmission. >>>> >>>> Lets examine 5 processors: >>>> >>>> When in gather, a membership is built. Once this membership is agreed >>>> upon, a commit token is originated by the ring rep creating the first >>>> transmission around the ring. >>>> >>>> totemsrp.c:4044 >>>> >>>> Example with 5 processors: >>>> 1,2,3,4,5 in gather >>>> 1 originates commit token >>>> >>>> >>>> Once a commit token is received in the gather state, the processor >>>> enters commit (first token rotation) >>>> >>>> totemsrp.c:4348 >>>> >>>> Example: >>>> 1 goes to commit >>>> 2 goes to commit >>>> 3 goes to commit >>>> 4 goes to commit >>>> 5 goes to commit >>>> >>>> Once a commit token is received in the commit state, the processor >>>> enters the recovery state only if this is a new commit token >>>> >>>> totemsrp.c:4360 >>>> >>>> 1 goes to recovery >>>> 2 goes to recovery >>>> 3 goes to recovery >>>> 4 goes to recovery >>>> 5 goes to recovery >>>> 1 originates orf token (this is the start of the third rotation) >>>> >>>> totemsrp.c:4374 >>>> >>>> >>>> In the above failure free scenario, we see that we will never get a >>>> premature orf token. >>>> >>>> Perhaps the problem your running into is your mixing partition/merging >>>> while forming a new ring. This could result in my_id.addr[0] and ring >>>> rep matching, but *old* commit tokens not being discarded by the ring >>>> rep in the recovery state. >>>> >>>> I believe the code would be better served by something like: >>>> >>>> if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >>>> memb_commit_token->ring_id.seq == instance->my_ring_id_seq) { >>>> ... >>>> } >>>> >>>> Note it may be safer to compare the entire ring_id rather then only the seq. >>>> >>>> This will cause any commit token not from the current agreed processors >>>> to be rejected. The ring seq increases by 4 for each new ring and it is >>>> not possible to have a matching seq + rep in two separate rings (this is >>>> why we store the ring seq to stable storage). >>>> >>>> Could you test that or tell me your test case or both? >>>> >>> >>> Let me discuss my test case based on your example mentioned above: >>> >>> 1,2,3,4,5 in gather >>> >>> 1 originates commit token >>> => enter commit state >>> => send commit_token(token_seq:0) to 2 >>> => receive commit_token(token_seq:4) from 5 >>> => enter RECOVERY state >>> => receive commit_token(token_seq:4) retransmitted by 5 >>> => enter OPERATIONAL state *prematurely*!! >>> => send initial ORF toke to 2 *prematurely*!! >>> >>> 2 receive commit_token(token_seq:0) from 1 >>> =>enter commit state >>> => send commit_token(token_seq:1) to 3 >>> => commit token retransmit timeout occur!! >>> => retransmit commit_token(token_seq:1) ... >>> => retransmit commit_token(token_seq:1) ... >>> => ... >>> => receive JoinMSG from 3 >>> => enter GATHER state, ... >>> => receive initial ORF token from 1 >>> => crash in *orf_token_rtr* funcation!! > > Today, I have checked my logging file carefully again, this > description did not incorrectly: > > 2 receive commit_token(token_seq:0) from 1 > =>enter commit state > => send commit_token(token_seq:1) to 3 > => receive commit_token(token_seq:6) from 1 > => enter RECOVERY state, **set orf_token_discard=0** > => commit token retransmit timeout occur!! > => retransmit commit_token(token_seq:1) ... > => retransmit commit_token(token_seq:1) ... > => ... > => receive JoinMSG from 3 in RECOVERY state > => call memb_recovery_state_token_loss > => old_ring_state_restore, set instance->my_aru=39 > => memb_state_gather_enter (instance, 5), **set > orf_token_discard=1** > => enter GATHER state > => call memb_state_gather_enter (instance, 14), **set > orf_token_discard=1** > => enter GATHER state > => receive initial ORF token from 1 > => crash in *orf_token_rtr* funcation!! > > I can confirmed that the value of instance->memb_state was > MEMB_STATE_GATHER(=2) when 2 received initial ORF token. I use > following code for logging in *message_handler_orf_token* function: > > =code= > log_printf (LOG_DEBUG, "recv orf_token from:%s, token_seq:%d, > my_token_seq:%d, aru:%x, my_aru:%x, seq:%x, state:%d", > nodeid_to_ip(token->header.nodeid, buf), > token->token_seq, > instance->my_token_seq, token->aru, instance->my_aru, token->seq, > instance->memb_state); > =log output= > Dec 08 23:02:03 corosync [TOTEM ] totemsrp.c:3544 recv orf_token > from:10.232.134.1, token_seq:0, my_token_seq:-1, aru:ffffffff, > my_aru:39, seq:0, state:2 > > It was the last log record while corosync exist. > > But I can't believed that corosync did not discard the ORF token when > instance->memb_state equals to MEMB_STATE_GATHER (as long as corosync > enter into GATHER state, orf_token_discard would be set to be 1). > > A reasonable guess is that the value of orf_token_discard was cleared > by unknown reason -- maybe there exist memory corruption. > >>> >>> 3 receive commit_token(token_seq:1) from 2 >>> => enter commit state >>> => send commit_token(token_seq:2) to 4 >>> => commit token retransmit timeout occur!! >>> => retransmit commit_token(token_seq:2) ... >>> => retransmit commit_token(token_seq:2) ... >>> => ... >>> => commit token lost timeout occur!! >>> => enter GATHER state, multicast JoinMSG ... >>> >>> 4 receive commit_token(token_seq:2) from 3 >>> => enter commit state >>> => send commit_token(token_seq:3) to 5 >>> => commit token retransmit timeout occur!! >>> => retransmit commit_token(token_seq:3) to 5... >>> => retransmit commit_token(token_seq:3) to 5... >>> => ... >>> >>> 5 receive commit_token(token_seq:3) from 4 >>> => enter commit state >>> => send commit_token(token_seq:4) to 1 >>> => commit token retransmit timeout occur!! >>> => retransmit commit_token(token_seq:4) to 1... >>> => retransmit commit_token(token_seq:4) to 1... >>> => ... >>> >>> The key point in my test cast is that one of processors(3 in this >>> case) runs into commit token timeout, and 1 receive commit token >>> retransmitted by 5, and enter OPERATIONAL state *prematurely*. >>> t >> Yunkai, >> >> I took a look at the code with a fresh set of eyes this morning for 3 >> hours and don't see how it is possible for processor 2 to crash in >> orf_token_rtr with the above scenario. The orf token will be discarded >> in all cases in GATHER and COMMIT before executing orf_token_rtr. >> orf_token_rtr should *never* run in GATHER or COMMIT states of the >> processor. >> >> (flatiron-1.4 line numbers): >> >> On startup: >> totemsrp.c:657 - memb_state = OPERATIONAL >> totemsrp.c:4467 - when network interface detected call gather_enter >> state = gather, orf_token_discard = 1 >> totemsrp.c:1870 - gather_enter sets orf_token_discard to 1 >> >> we can proceed to enter OPERATIONAL here or go directly to COMMIT >> >> COMMIT path: >> totemsrp.c:4331 - COMMIT token received in GATHER state >> (orf_token_discard still equals 1) >> totemsrp.c:4332 - join message received in COMMIT state - enter GATHER >> (orf_token_discard still equals 1) >> totemsrp.c:3428 - orf token discarded (because orf_token_discard still >> equals 1) >> >> The only code that resets orf_token_discard to 0 is >> totemsrp.c:2008 - entered recovery state >> >> Analyzing the path where totem transitions from RECOVERY -> GATHER -> >> COMMIT: >> >> totemsrp.c:4276 - memb_state_gather_enter called >> totemsrp.c:1870 - gather enter sets orf_token_discard to 1 >> totemsrp.c:4332 - join message received in COMMIT state - enter GATHER >> (orf_token_discard still equals 1) >> totemsrp.c:3428 - orf token discarded (because orf_token_discard still >> equals 1) >> >> Given that the orf token would always be discarded by processor 2 in the >> above test case you gave, I don't see a scenario in the code where >> orf_token_rtr could crash in that test case. If you can show me how >> this happens, we could fix this up. One possibility is some sort of >> thread race. Could you determine that orf_token_discard is equal to 0 >> when your failure occurs? >> >> As for entering operational prematurely: >> If this is a problem, it is a separate problem from the above analysis. >> Lets get the above analysis fixed if it is a problem. >> >> Really looking forward to hearing back from you so we can fix the >> problem you see. >> >> Regards >> -steve >> >>>> Thanks >>>> -steve >>>> >>>> >>>>> range = orf_token->seq - instance->my_aru; >>>>> assert (range < QUEUE_RTR_ITEMS_SIZE_MAX); >>>>> ... >>>>> >>>>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> >>>>> --- >>>>> exec/totemsrp.c | 3 ++- >>>>> 1 files changed, 2 insertions(+), 1 deletions(-) >>>>> >>>>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c >>>>> index 5a78962..22717ac 100644 >>>>> --- a/exec/totemsrp.c >>>>> +++ b/exec/totemsrp.c >>>>> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token ( >>>>> break; >>>>> >>>>> case MEMB_STATE_RECOVERY: >>>>> - if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) { >>>>> + if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >>>>> + memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) { >>>>> log_printf (instance->totemsrp_log_level_debug, >>>>> "Sending initial ORF token\n"); >>>>> >>>> >>>> _______________________________________________ >>>> discuss mailing list >>>> discuss@xxxxxxxxxxxx >>>> http://lists.corosync.org/mailman/listinfo/discuss >>> >>> >>> >> >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss > > > > -- > Yunkai Zhang > Work at Taobao -- Yunkai Zhang Work at Taobao _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss