Re: [PATCH v2] [TotemSRP] Ignore duplicated commit tokens in recovery mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jason,
patch looks good. This touches very delicate part of protocol, so I
would really like to see also another reviewer comment. Chrissie, Steve?

Regards,
  Honza


jason napsal(a):
> In active rrp mode, commit tokens are treated as mcast data messages,
> thus, rrp directly delivers them to srp layer by active_mcast_recv().
> This will result in duplicated commit tokens being received by srp
> from different heartbeat links. If node is in recovery state and has
> already sent out the initial orf token, those duplicated commit tokens
> will cause message_handler_memb_commit_token() to send initial orf
> token again! This is wrong because it resets the orf token content in
> instance->orf_token_retransmit, which breaks the token retransmission
> state.
> 
> Furthermore, by sending those initial orf tokens again and again, it
> may lead active_token_recv() to drop some subsequent orf tokens. It is
> OK for rrp because srp will do token retransmission, but as said
> above, srp retransmission state has already been broken, so finally we
> meet a "token lost in recovery state" condition caused by software. If
> token timeout value is large, then it will takes long time to create a
> new ring.
> 
> This can be reproduced by having two noded set to active rrp mode,
> with two heartbeat links. Then with one node always on, let the other
> one do stop/start again and again. It has a low probability to
> reproduce. In theory, I think, the more heartbeat links used, the more
> easily it can be reproduced.
> 
> This problem can be resolved by letting
> message_handler_memb_commit_token() to ignore duplicated commit tokens
> in recovery state if node (the ring representation) has already sent
> out the initial orf token.
> 
> Different from prev take, this version do not depends on stored token
> data but uses originated_orf_token in totemsrp_instance to remember if
> initial orf token has been already originated for current membership.
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux