On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote: > On 12/10/2011 02:27 AM, Yunkai Zhang wrote: >> After the representative of the new ring entered into RECOVERY state, >> it was waiting for the commit_token on the second rotation. But when >> it got a ncommit_token, it didn't check the status of this token >> which could be retransmitted by upstream node on the first rotation, >> and then the representative node would send the initial ORF token >> prematurely. >> >> If other nodes which have not entered into RECOVERY state received this >> incorrect initial orf_token(which seq is 0), these nodes may crash >> caused by assert instraction in *orf_token_rtr* function: >> ... > > Yunkai, > > First off thanks for digging into Totem. I know the protocol is hard to > understand and its nice to have more people to bounce ideas off of. > > Do you have a test case for this? > > Patch is a clever solution but the root cause seems incorrect. I'd like > to get to the RCA of your bug. Just thinking about some of the problems > you have reported in the past (13 seconds to execute a recovery copy per > node), it could be possible a retransmission of the commit token from 5 > to 1 (kicking off a new orf token) in the below example could occur > while in recovery. That should be ok though, since retransmitted orf > tokens are discarded (totemsrp:3564). I don't see any immediate way to > have a premature token transmission. > > Lets examine 5 processors: > > When in gather, a membership is built. Once this membership is agreed > upon, a commit token is originated by the ring rep creating the first > transmission around the ring. > > totemsrp.c:4044 > > Example with 5 processors: > 1,2,3,4,5 in gather > 1 originates commit token > > > Once a commit token is received in the gather state, the processor > enters commit (first token rotation) > > totemsrp.c:4348 > > Example: > 1 goes to commit > 2 goes to commit > 3 goes to commit > 4 goes to commit > 5 goes to commit > > Once a commit token is received in the commit state, the processor > enters the recovery state only if this is a new commit token > > totemsrp.c:4360 > > 1 goes to recovery > 2 goes to recovery > 3 goes to recovery > 4 goes to recovery > 5 goes to recovery > 1 originates orf token (this is the start of the third rotation) > > totemsrp.c:4374 > > > In the above failure free scenario, we see that we will never get a > premature orf token. > > Perhaps the problem your running into is your mixing partition/merging > while forming a new ring. This could result in my_id.addr[0] and ring > rep matching, but *old* commit tokens not being discarded by the ring > rep in the recovery state. > > I believe the code would be better served by something like: > > if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && > memb_commit_token->ring_id.seq == instance->my_ring_id_seq) { > ... > } > > Note it may be safer to compare the entire ring_id rather then only the seq. > > This will cause any commit token not from the current agreed processors > to be rejected. The ring seq increases by 4 for each new ring and it is > not possible to have a matching seq + rep in two separate rings (this is > why we store the ring seq to stable storage). > > Could you test that or tell me your test case or both? > Let me discuss my test case based on your example mentioned above: 1,2,3,4,5 in gather 1 originates commit token => enter commit state => send commit_token(token_seq:0) to 2 => receive commit_token(token_seq:4) from 5 => enter RECOVERY state => receive commit_token(token_seq:4) retransmitted by 5 => enter OPERATIONAL state *prematurely*!! => send initial ORF toke to 2 *prematurely*!! 2 receive commit_token(token_seq:0) from 1 =>enter commit state => send commit_token(token_seq:1) to 3 => commit token retransmit timeout occur!! => retransmit commit_token(token_seq:1) ... => retransmit commit_token(token_seq:1) ... => ... => receive JoinMSG from 3 => enter GATHER state, ... => receive initial ORF token from 1 => crash in *orf_token_rtr* funcation!! 3 receive commit_token(token_seq:1) from 2 => enter commit state => send commit_token(token_seq:2) to 4 => commit token retransmit timeout occur!! => retransmit commit_token(token_seq:2) ... => retransmit commit_token(token_seq:2) ... => ... => commit token lost timeout occur!! => enter GATHER state, multicast JoinMSG ... 4 receive commit_token(token_seq:2) from 3 => enter commit state => send commit_token(token_seq:3) to 5 => commit token retransmit timeout occur!! => retransmit commit_token(token_seq:3) to 5... => retransmit commit_token(token_seq:3) to 5... => ... 5 receive commit_token(token_seq:3) from 4 => enter commit state => send commit_token(token_seq:4) to 1 => commit token retransmit timeout occur!! => retransmit commit_token(token_seq:4) to 1... => retransmit commit_token(token_seq:4) to 1... => ... The key point in my test cast is that one of processors(3 in this case) runs into commit token timeout, and 1 receive commit token retransmitted by 5, and enter OPERATIONAL state *prematurely*. > Thanks > -steve > > >> range = orf_token->seq - instance->my_aru; >> assert (range < QUEUE_RTR_ITEMS_SIZE_MAX); >> ... >> >> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> >> --- >> exec/totemsrp.c | 3 ++- >> 1 files changed, 2 insertions(+), 1 deletions(-) >> >> diff --git a/exec/totemsrp.c b/exec/totemsrp.c >> index 5a78962..22717ac 100644 >> --- a/exec/totemsrp.c >> +++ b/exec/totemsrp.c >> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token ( >> break; >> >> case MEMB_STATE_RECOVERY: >> - if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) { >> + if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >> + memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) { >> log_printf (instance->totemsrp_log_level_debug, >> "Sending initial ORF token\n"); >> > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss -- Yunkai Zhang Work at Taobao _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss