On 11/30/2011 02:35 PM, John Thompson wrote: > On Thu, Dec 1, 2011 at 7:19 AM, Steven Dake <sdake@xxxxxxxxxx > <mailto:sdake@xxxxxxxxxx>> wrote: > > The analysis looks good but solution looks wrong. > > If token is lost (the previous ring id fails for some reason) on the > node that cancels the token timeout, the protocol will freeze. > > this part from your logs doesn't look correct: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old ring > messages from 42726-46695. > > Nov 30 00:52:06 corosync [TOTEM ] totemsrp.c:2145 Originated 16240 > messages in RECOVERY. > > ==>Costed 13 seconds to copy all old ring messages > > At most 3969 messages would be originated, not 16240 messages. Perhaps > the loop is operating incorrectly? Can you print out range and > messages_originated at line 2111 with your test case? One of these > values may be getting corrupted by an errant memcpy. > > > The message numbers printed here are in hex and so the difference is > 16240 between them > Thanks for pointing that out. :) Regards -steve > Regards, > John > > Insepcting the code the memset and memcpy operations look correct in > size in the range loop (although the previous could could have an errant > memcpy). > > Regards > -steve > > > On 11/30/2011 08:20 AM, Yunkai Zhang wrote: > > I found that corosync may trigger token lost timeout in RECOVERY > state. > > The reason is that different nodes may spend different time to > copy old > > ring messages from its regular_sort_queue to retrans_message_queue in > > *memb_state_recovery_enter* function. > > > > Let's have a look at the following logs(don't care about the line > numbers, > > I have modify the source code for debugging): > > > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2034 my_memb:5, > new_memb:5, trans_memb:5 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [0] member > 192.168.0.92 <http://192.168.0.92>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [1] member > 192.168.0.124 <http://192.168.0.124>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [2] member > 192.168.0.142 <http://192.168.0.142>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [3] member > 192.168.0.160 <http://192.168.0.160>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [4] member > 192.168.0.164 <http://192.168.0.164>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [0] > member 192.168.0.92 <http://192.168.0.92>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring > seq 1161924 rep 192.168.0.92 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high > delivered 46695 received flag 1 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [1] > member 192.168.0.124 <http://192.168.0.124>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring > seq 1161924 rep 192.168.0.92 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high > delivered 46695 received flag 1 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [2] > member 192.168.0.142 <http://192.168.0.142>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring > seq 1161924 rep 192.168.0.92 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 42725 high > delivered 42725 received flag 0 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [3] > member 192.168.0.160 <http://192.168.0.160>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring > seq 1161924 rep 192.168.0.92 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high > delivered 46695 received flag 1 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [4] > member 192.168.0.164 <http://192.168.0.164>: > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring > seq 1161924 rep 192.168.0.92 > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high > delivered 46695 received flag 1 > > > > We can see that there was five nodes forming a new ring at > 00:51:53, they had > > different aru value and node(192.168.0.142) had the smallest aru > value. > > > > Here is some logs comes from these five nodes, we can see how much > time they > > spend for copying old ring messages: > > > > [logs from 192.168.0.92] > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old > ring messages from 42726-46695. > > Nov 30 00:52:06 corosync [TOTEM ] totemsrp.c:2145 Originated 16240 > messages in RECOVERY. > > ==>Costed 13 seconds to copy all old ring messages > > > > [logs from 192.168.0.124] > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old > ring messages from 42726-46695. > > Nov 30 00:52:09 corosync [TOTEM ] totemsrp.c:2145 Originated 16240 > messages in RECOVERY. > > ==>Costed 16 seconds to copy all old ring messages > > > > [logs from 192.168.0.142] > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old > ring messages from 42726-46693. > > Nov 30 00:51:56 corosync [TOTEM ] totemsrp.c:2145 Originated 2321 > messages in RECOVERY. > > ==>Costed 3 seconds to copy all old ring messages > > > > [logs from 192.168.0.160] > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old > ring messages from 42726-46695. > > Nov 30 00:51:58 corosync [TOTEM ] totemsrp.c:2145 Originated 16240 > messages in RECOVERY. > > ==>Costed 5 seconds to copy all old ring messages > > > > [192.168.0.164] > > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old > ring messages from 42726-46695. > > Nov 30 00:52:05 corosync [TOTEM ] totemsrp.c:2145 Originated 16240 > messages in RECOVERY. > > ==>Costed 12 seconds to copy all old ring messages > > > > After the node(192.168.0.142) costed 3 seconds to copy all > messages, it start > > waiting for the commit_token from it's upstream(192.168.0.124), > but it could > > not received as it's upstream(192.168.0.124) was busy copying > messages at that > > time between 00:51:56 to 00:52:09. So this node(192.168.0.142) > would finally > > trigger token lost timeout in RECOVERY state. > > > > More seriously, after it triggered token lost timeout, the > node(192.168.0.142) > > enter GATHER state, but it could not receive JoinMSG from other > nodes because > > others was also busy at that time, and then this node ran into > consensus > > expired timeout state finally. > > > > To solve this issue, we can cancel token_timeout > > (but keeping token_retransmit_timeout) in RECOVERY state. > > > > Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx > <mailto:qiushu.zyk@xxxxxxxxxx>> > > --- > > exec/totemsrp.c | 2 +- > > 1 files changed, 1 insertions(+), 1 deletions(-) > > > > diff --git a/exec/totemsrp.c b/exec/totemsrp.c > > index 5a78962..e0299e7 100644 > > --- a/exec/totemsrp.c > > +++ b/exec/totemsrp.c > > @@ -2206,7 +2206,7 @@ originated: > > instance->my_install_seq = SEQNO_START_MSG; > > instance->last_released = SEQNO_START_MSG; > > > > - reset_token_timeout (instance); // REVIEWED > > + cancel_token_timeout (instance); > > reset_token_retransmit_timeout (instance); // REVIEWED > > > > instance->memb_state = MEMB_STATE_RECOVERY; > > _______________________________________________ > discuss mailing list > discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx> > http://lists.corosync.org/mailman/listinfo/discuss > > _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss