Jason, patch looks good. This touches very delicate part of protocol, so I would really like to see also another reviewer comment. Chrissie, Steve? Regards, Honza jason napsal(a): > In active rrp mode, commit tokens are treated as mcast data messages, > thus, rrp directly delivers them to srp layer by active_mcast_recv(). > This will result in duplicated commit tokens being received by srp > from different heartbeat links. If node is in recovery state and has > already sent out the initial orf token, those duplicated commit tokens > will cause message_handler_memb_commit_token() to send initial orf > token again! This is wrong because it resets the orf token content in > instance->orf_token_retransmit, which breaks the token retransmission > state. > > Furthermore, by sending those initial orf tokens again and again, it > may lead active_token_recv() to drop some subsequent orf tokens. It is > OK for rrp because srp will do token retransmission, but as said > above, srp retransmission state has already been broken, so finally we > meet a "token lost in recovery state" condition caused by software. If > token timeout value is large, then it will takes long time to create a > new ring. > > This can be reproduced by having two noded set to active rrp mode, > with two heartbeat links. Then with one node always on, let the other > one do stop/start again and again. It has a low probability to > reproduce. In theory, I think, the more heartbeat links used, the more > easily it can be reproduced. > > This problem can be resolved by letting > message_handler_memb_commit_token() to ignore duplicated commit tokens > in recovery state if node (the ring representation) has already sent > out the initial orf token. > > Different from prev take, this version do not depends on stored token > data but uses originated_orf_token in totemsrp_instance to remember if > initial orf token has been already originated for current membership. > > > > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx > http://lists.corosync.org/mailman/listinfo/discuss > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss