Perfect. So patch is in master as 4ee84c51fa73c4ec7cbee922111a140a3aaf75df Thanks for patch and reviews. Regards, Honza > lgtm2 ;-) > > > Chrissie > > > On 15/01/15 12:21, Steven Dake wrote: >> forgot to copy list. >> >> Honza, lgtm. >> >> Regards >> -steve >> >> On Thu, Jan 15, 2015 at 5:20 AM, Steven Dake <steven.dake@xxxxxxxxx >> <mailto:steven.dake@xxxxxxxxx>> wrote: >> >> Honza, >> >> lgtm. >> >> regards >> -steve >> >> On Wed, Jan 14, 2015 at 10:19 AM, Jan Friesse <jfriesse@xxxxxxxxxx >> <mailto:jfriesse@xxxxxxxxxx>> wrote: >> >> Jason, >> patch looks good. This touches very delicate part of protocol, >> so I >> would really like to see also another reviewer comment. >> Chrissie, Steve? >> >> Regards, >> Honza >> >> >> jason napsal(a): >> > In active rrp mode, commit tokens are treated as mcast data >> messages, >> > thus, rrp directly delivers them to srp layer by >> active_mcast_recv(). >> > This will result in duplicated commit tokens being received >> by srp >> > from different heartbeat links. If node is in recovery state >> and has >> > already sent out the initial orf token, those duplicated >> commit tokens >> > will cause message_handler_memb_commit_token() to send >> initial orf >> > token again! This is wrong because it resets the orf token >> content in >> > instance->orf_token_retransmit, which breaks the token >> retransmission >> > state. >> > >> > Furthermore, by sending those initial orf tokens again and >> again, it >> > may lead active_token_recv() to drop some subsequent orf >> tokens. It is >> > OK for rrp because srp will do token retransmission, but as >> said >> > above, srp retransmission state has already been broken, so >> finally we >> > meet a "token lost in recovery state" condition caused by >> software. If >> > token timeout value is large, then it will takes long time to >> create a >> > new ring. >> > >> > This can be reproduced by having two noded set to active rrp >> mode, >> > with two heartbeat links. Then with one node always on, let >> the other >> > one do stop/start again and again. It has a low probability to >> > reproduce. In theory, I think, the more heartbeat links used, >> the more >> > easily it can be reproduced. >> > >> > This problem can be resolved by letting >> > message_handler_memb_commit_token() to ignore duplicated >> commit tokens >> > in recovery state if node (the ring representation) has >> already sent >> > out the initial orf token. >> > >> > Different from prev take, this version do not depends on >> stored token >> > data but uses originated_orf_token in totemsrp_instance to >> remember if >> > initial orf token has been already originated for current >> membership. >> > >> > >> > >> > >> > _______________________________________________ >> > discuss mailing list >> > discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx> >> > http://lists.corosync.org/mailman/listinfo/discuss >> > >> >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx> >> http://lists.corosync.org/mailman/listinfo/discuss >> >> >> >> >> >> _______________________________________________ >> discuss mailing list >> discuss@xxxxxxxxxxxx >> http://lists.corosync.org/mailman/listinfo/discuss >> > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss