On 12/11/2011 12:29 AM, Steven Dake wrote: > On 12/10/2011 10:43 AM, Yunkai Zhang wrote: >> On Sun, Dec 11, 2011 at 12:11 AM, Steven Dake <sdake@xxxxxxxxxx> wrote: >>> On 12/10/2011 02:27 AM, Yunkai Zhang wrote: >>>> After the representative of the new ring entered into RECOVERY state, >>>> it was waiting for the commit_token on the second rotation. But when >>>> it got a ncommit_token, it didn't check the status of this token >>>> which could be retransmitted by upstream node on the first rotation, >>>> and then the representative node would send the initial ORF token >>>> prematurely. >>>> >>>> If other nodes which have not entered into RECOVERY state received this >>>> incorrect initial orf_token(which seq is 0), these nodes may crash >>>> caused by assert instraction in *orf_token_rtr* function: >>>> ... >>> >>> Yunkai, >>> >>> First off thanks for digging into Totem. I know the protocol is hard to >>> understand and its nice to have more people to bounce ideas off of. >>> >>> Do you have a test case for this? >>> >>> Patch is a clever solution but the root cause seems incorrect. I'd like >>> to get to the RCA of your bug. Just thinking about some of the problems >>> you have reported in the past (13 seconds to execute a recovery copy per >>> node), it could be possible a retransmission of the commit token from 5 >>> to 1 (kicking off a new orf token) in the below example could occur >>> while in recovery. That should be ok though, since retransmitted orf >>> tokens are discarded (totemsrp:3564). I don't see any immediate way to >>> have a premature token transmission. >>> >>> Lets examine 5 processors: >>> >>> When in gather, a membership is built. Once this membership is agreed >>> upon, a commit token is originated by the ring rep creating the first >>> transmission around the ring. >>> >>> totemsrp.c:4044 >>> >>> Example with 5 processors: >>> 1,2,3,4,5 in gather >>> 1 originates commit token >>> >>> >>> Once a commit token is received in the gather state, the processor >>> enters commit (first token rotation) >>> >>> totemsrp.c:4348 >>> >>> Example: >>> 1 goes to commit >>> 2 goes to commit >>> 3 goes to commit >>> 4 goes to commit >>> 5 goes to commit >>> >>> Once a commit token is received in the commit state, the processor >>> enters the recovery state only if this is a new commit token >>> >>> totemsrp.c:4360 >>> >>> 1 goes to recovery >>> 2 goes to recovery >>> 3 goes to recovery >>> 4 goes to recovery >>> 5 goes to recovery >>> 1 originates orf token (this is the start of the third rotation) >>> >>> totemsrp.c:4374 >>> >>> >>> In the above failure free scenario, we see that we will never get a >>> premature orf token. >>> >>> Perhaps the problem your running into is your mixing partition/merging >>> while forming a new ring. This could result in my_id.addr[0] and ring >>> rep matching, but *old* commit tokens not being discarded by the ring >>> rep in the recovery state. >>> >>> I believe the code would be better served by something like: >>> >>> if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >>> memb_commit_token->ring_id.seq == instance->my_ring_id_seq) { >>> ... >>> } >>> >>> Note it may be safer to compare the entire ring_id rather then only the seq. >>> >>> This will cause any commit token not from the current agreed processors >>> to be rejected. The ring seq increases by 4 for each new ring and it is >>> not possible to have a matching seq + rep in two separate rings (this is >>> why we store the ring seq to stable storage). >>> >>> Could you test that or tell me your test case or both? >>> >> >> Let me discuss my test case based on your example mentioned above: >> >> 1,2,3,4,5 in gather >> >> 1 originates commit token >> => enter commit state >> => send commit_token(token_seq:0) to 2 >> => receive commit_token(token_seq:4) from 5 >> => enter RECOVERY state >> => receive commit_token(token_seq:4) retransmitted by 5 >> => enter OPERATIONAL state *prematurely*!! > > operational is triggered by totemsrp.c:4540 atomically once all nodes > have recovered all of the messages. Are you sure the commit token is > what is received here which is triggering operational? > >> => send initial ORF toke to 2 *prematurely*!! > > agree an extra ORF token will be sent here each time the commit token is > retransmitted > >> >> 2 receive commit_token(token_seq:0) from 1 >> =>enter commit state >> => send commit_token(token_seq:1) to 3 >> => commit token retransmit timeout occur!! >> => retransmit commit_token(token_seq:1) ... >> => retransmit commit_token(token_seq:1) ... >> => ... >> => receive JoinMSG from 3 >> => enter GATHER state, ... >> => receive initial ORF token from 1 >> => crash in *orf_token_rtr* funcation!! > > This points out the issue is that the orf token should be rejected at > totemsrp.c:3565. > > After entering commit: > totemsrp.c:4356 > totemsrp.c:2003 sets the new ring id (which was incremented) > receive join message from 3 > enter gather state > Note I tracked down a bugzilla in March related to a change which removed the ring restore. See the rationale and reproducer here: https://bugzilla.redhat.com/show_bug.cgi?id=623176 Comment #6 > In this case, totem goes from commit->gather, On commit->gather, in some > way the ring id should be invalidated. At one point totem used to set > and restore the ring id, and for some reason this was removed. I don't > recall the rationale but it should be in the changelogs. Bit tired now, > may have a look tomorrow. > > As I recall it only did this in the recovery state, but given your > scenario it should also be done for the commit state. > > see totemsrp.c:1609 > > Do you have a test case that consistently reproduces this issue? > > Regards > -steve > > >> >> 3 receive commit_token(token_seq:1) from 2 >> => enter commit state >> => send commit_token(token_seq:2) to 4 >> => commit token retransmit timeout occur!! >> => retransmit commit_token(token_seq:2) ... >> => retransmit commit_token(token_seq:2) ... >> => ... >> => commit token lost timeout occur!! >> => enter GATHER state, multicast JoinMSG ... >> >> 4 receive commit_token(token_seq:2) from 3 >> => enter commit state >> => send commit_token(token_seq:3) to 5 >> => commit token retransmit timeout occur!! >> => retransmit commit_token(token_seq:3) to 5... >> => retransmit commit_token(token_seq:3) to 5... >> => ... >> >> 5 receive commit_token(token_seq:3) from 4 >> => enter commit state >> => send commit_token(token_seq:4) to 1 >> => commit token retransmit timeout occur!! >> => retransmit commit_token(token_seq:4) to 1... >> => retransmit commit_token(token_seq:4) to 1... >> => ... >> >> The key point in my test cast is that one of processors(3 in this >> case) runs into commit token timeout, and 1 receive commit token >> retransmitted by 5, and enter OPERATIONAL state *prematurely*. >> >>> Thanks >>> -steve >>> >>> >>>> range = orf_token->seq - instance->my_aru; >>>> assert (range < QUEUE_RTR_ITEMS_SIZE_MAX); >>>> ... >>>> >>>> Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> >>>> --- >>>> exec/totemsrp.c | 3 ++- >>>> 1 files changed, 2 insertions(+), 1 deletions(-) >>>> >>>> diff --git a/exec/totemsrp.c b/exec/totemsrp.c >>>> index 5a78962..22717ac 100644 >>>> --- a/exec/totemsrp.c >>>> +++ b/exec/totemsrp.c >>>> @@ -4372,7 +4372,8 @@ static int message_handler_memb_commit_token ( >>>> break; >>>> >>>> case MEMB_STATE_RECOVERY: >>>> - if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep)) { >>>> + if (totemip_equal (&instance->my_id.addr[0], &instance->my_ring_id.rep) && >>>> + memb_commit_token->token_seq == 2*memb_commit_token->addr_entries) { >>>> log_printf (instance->totemsrp_log_level_debug, >>>> "Sending initial ORF token\n"); >>>> >>> >>> _______________________________________________ >>> discuss mailing list >>> discuss@xxxxxxxxxxxx >>> http://lists.corosync.org/mailman/listinfo/discuss >> >> >> > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss