On 12/10/2011 10:43 AM, Yunkai Zhang wrote: > On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote: >> On 12/10/2011 02:27 AM, Yunkai Zhang wrote: >>> After the representative of the new ring entered into RECOVERY state, >>> it was waiting for the commit_token on the second rotation. But when >>> it got a ncommit_token, it didn't check the status of this token >>> which could be retransmitted by upstream node on the first rotation, >>> and then the representative node would send the initial ORF token >>> prematurely. >>> >>> If other nodes which have not entered into RECOVERY state received this >>> incorrect initial orf_token(which seq is 0), these nodes may crash >>> caused by assert instraction in *orf_token_rtr* function: >>> ... >> >> Yunkai, >> >> First off thanks for digging into Totem. I know the protocol is hard to >> understand and its nice to have more people to bounce ideas off of. >> >> Do you have a test case for this? >> >> Patch is a clever solution but the root cause seems incorrect. I'd like >> to get to the RCA of your bug. Just thinking about some of the problems >> you have reported in the past (13 seconds to execute a recovery copy per >> node), it could be possible a retransmission of the commit token from 5 >> to 1 (kicking off a new orf token) in the below example could occur >> while in recovery. That should be ok though, since retransmitted orf >> tokens are discarded (totemsrp:3564). I don't see any immediate way to >> have a premature token transmission. >> >> Lets examine 5 processors: >> >> When in gather, a membership is built. Once this membership is agreed >> upon, a commit token is originated by the ring rep creating the first >> transmission around the ring. >> >> totemsrp.c:4044 >> >> Example with 5 processors: >> 1,2,3,4,5 in gather >> 1 originates commit token >> >> >> Once a commit token is received in the gather state, the processor >> enters commit (first token rotation) >> >> totemsrp.c:4348 >> >> Example: >> 1 goes to commit >> 2 goes to commit >> 3 goes to commit >> 4 goes to commit >> 5 goes to commit >> >> Once a commit token is received in the commit state, the processor >> enters the recovery state only if this is a new commit token >> >> totemsrp.c:4360 >> >> 1 goes to recovery >> 2 goes to recovery >> 3 goes to recovery >> 4 goes to recovery >> 5 goes to recovery >> 1 originates orf token (this is the start of the third rotation) >> >> totemsrp.c:4374 >> >> >> In the above failure free scenario, we see that we will never get a >> premature orf token. >> >> Perhaps the problem your running into is your mixing partition/merging >> while forming a new ring. This could result in my_id.addr[0] and ring >> rep matching, but *old* commit tokens not being discarded by the ring >> rep in the recovery state. >> >> I believe the code would be better served by something like: >> >> if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >> memb_commit_token->ring_id.seq == instance->my_ring_id_seq) { >> ... >> } >> >> Note it may be safer to compare the entire ring_id rather then only the seq. >> >> This will cause any commit token not from the current agreed processors >> to be rejected. The ring seq increases by 4 for each new ring and it is >> not possible to have a matching seq + rep in two separate rings (this is >> why we store the ring seq to stable storage). >> >> Could you test that or tell me your test case or both? >> > > Let me discuss my test case based on your example mentioned above: > > 1,2,3,4,5 in gather > > 1 originates commit token > => enter commit state > => send commit_token(token_seq:0) to 2 > => receive commit_token(token_seq:4) from 5 > => enter RECOVERY state > => receive commit_token(token_seq:4) retransmitted by 5 > => enter OPERATIONAL state *prematurely*!! operational is triggered by totemsrp.c:4540 atomically once all nodes have recovered all of the messages. Are you sure the commit token is what is received here which is triggering operational? > => send initial ORF toke to 2 *prematurely*!! agree an extra ORF token will be sent here each time the commit token is retransmitted > > 2 receive commit_token(token_seq:0) from 1 > =>enter commit state > => send commit_token(token_seq:1) to 3 > => commit token retransmit timeout occur!! > => retransmit commit_token(token_seq:1) ... > => retransmit commit_token(token_seq:1) ... > => ... > => receive JoinMSG from 3 > => enter GATHER state, ... > => receive initial ORF token from 1 > => crash in *orf_token_rtr* funcation!! This points out the issue is that the orf token should be rejected at totemsrp.c:3565. After entering commit: totemsrp.c:4356 totemsrp.c:2003 sets the new ring id (which was incremented) receive join message from 3 enter gather state In this case, totem goes from commit->gather, On commit->gather, in some way the ring id should be invalidated. At one point totem used to set and restore the ring id, and for some reason this was removed. I don't recall the rationale but it should be in the changelogs. Bit tired now, may have a look tomorrow. As I recall it only did this in the recovery state, but given your scenario it should also be done for the commit state. see totemsrp.c:1609 Do you have a test case that consistently reproduces this issue? Regards -steve > > 3 receive commit_token(token_seq:1) from 2 > => enter commit state > => send commit_token(token_seq:2) to 4 > => commit token retransmit timeout occur!! > => retransmit commit_token(token_seq:2) ... > => retransmit commit_token(token_seq:2) ... > => ... > => commit token lost timeout occur!! > => enter GATHER state, multicast JoinMSG ... > > 4 receive commit_token(token_seq:2) from 3 > => enter commit state > => send commit_token(token_seq:3) to 5 > => commit token retransmit timeout occur!! > => retransmit commit_token(token_seq:3) to 5... > => retransmit commit_token(token_seq:3) to 5... > => ... > > 5 receive commit_token(token_seq:3) from 4 > => enter commit state > => send commit_token(token_seq:4) to 1 > => commit token retransmit timeout occur!! > => retransmit commit_token(token_seq:4) to 1... > => retransmit commit_token(token_seq:4) to 1... > => ... > > The key point in my test cast is that one of processors(3 in this > case) runs into commit token timeout, and 1 receive commit token > retransmitted by 5, and enter OPERATIONAL state *prematurely*. > >> Thanks >> -steve >> >> >>> range = orf_token->seq - instance->my_aru; >>> assert (range < QUEUE_RTR_ITEMS_SIZE_MAX); >>> ... >>> >>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> >>> --- >>> exec/totemsrp.c | 3 ++- >>> 1 files changed, 2 insertions(+), 1 deletions(-) >>> >>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c >>> index 5a78962..22717ac 100644 >>> --- a/exec/totemsrp.c >>> +++ b/exec/totemsrp.c >>> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token ( >>> break; >>> >>> case MEMB_STATE_RECOVERY: >>> - if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) { >>> + if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >>> + memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) { >>> log_printf (instance->totemsrp_log_level_debug, >>> "Sending initial ORF token\n"); >>> >> >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss > > > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss