Re: [PATCH v2] Fixed bug: Corosync may trigger token lost timeout in RECOVERY state

Steven Dake <sdake@xxxxxxxxxx> · Wed, 30 Nov 2011 14:39:22 -0700

On 11/30/2011 02:35 PM, John Thompson wrote:
> On Thu, Dec 1, 2011 at 7:19 AM, Steven Dake <sdake@xxxxxxxxxx
> <mailto:sdake@xxxxxxxxxx>> wrote:
> 
>     The analysis looks good but solution looks wrong.
> 
>     If token is lost (the previous ring id fails for some reason) on the
>     node that cancels the token timeout, the protocol will freeze.
> 
>     this part from your logs doesn't look correct:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old ring
>     messages from 42726-46695.
>     > Nov 30 00:52:06 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>     messages in RECOVERY.
>     > ==>Costed 13 seconds to copy all old ring messages
> 
>     At most 3969 messages would be originated, not 16240 messages.  Perhaps
>     the loop is operating incorrectly?  Can you print out range and
>     messages_originated at line 2111 with your test case?  One of these
>     values may be getting corrupted by an errant memcpy.
> 
> 
> The message numbers printed here are in hex and so the difference is
> 16240 between them
>  

Thanks for pointing that out. :)

Regards
-steve
> Regards,
> John
> 
>     Insepcting the code the memset and memcpy operations look correct in
>     size in the range loop (although the previous could could have an errant
>     memcpy).
> 
>     Regards
>     -steve
> 
> 
>     On 11/30/2011 08:20 AM, Yunkai Zhang wrote:
>     > I found that corosync may trigger token lost timeout in RECOVERY
>     state.
>     > The reason is that different nodes may spend different time to
>     copy old
>     > ring messages from its regular_sort_queue to retrans_message_queue in
>     > *memb_state_recovery_enter* function.
>     >
>     > Let's have a look at the following logs(don't care about the line
>     numbers,
>     > I have modify the source code for debugging):
>     >
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2034 my_memb:5,
>     new_memb:5, trans_memb:5
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [0] member
>     192.168.0.92 <http://192.168.0.92>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [1] member
>     192.168.0.124 <http://192.168.0.124>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [2] member
>     192.168.0.142 <http://192.168.0.142>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [3] member
>     192.168.0.160 <http://192.168.0.160>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [4] member
>     192.168.0.164 <http://192.168.0.164>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [0]
>     member 192.168.0.92 <http://192.168.0.92>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>     seq 1161924 rep 192.168.0.92
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high
>     delivered 46695 received flag 1
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [1]
>     member 192.168.0.124 <http://192.168.0.124>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>     seq 1161924 rep 192.168.0.92
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high
>     delivered 46695 received flag 1
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [2]
>     member 192.168.0.142 <http://192.168.0.142>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>     seq 1161924 rep 192.168.0.92
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 42725 high
>     delivered 42725 received flag 0
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [3]
>     member 192.168.0.160 <http://192.168.0.160>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>     seq 1161924 rep 192.168.0.92
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high
>     delivered 46695 received flag 1
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [4]
>     member 192.168.0.164 <http://192.168.0.164>:
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>     seq 1161924 rep 192.168.0.92
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high
>     delivered 46695 received flag 1
>     >
>     > We can see that there was five nodes forming a new ring at
>     00:51:53, they had
>     > different aru value and node(192.168.0.142) had the smallest aru
>     value.
>     >
>     > Here is some logs comes from these five nodes, we can see how much
>     time they
>     > spend for copying old ring messages:
>     >
>     > [logs from 192.168.0.92]
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>     ring messages from 42726-46695.
>     > Nov 30 00:52:06 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>     messages in RECOVERY.
>     > ==>Costed 13 seconds to copy all old ring messages
>     >
>     > [logs from 192.168.0.124]
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>     ring messages from 42726-46695.
>     > Nov 30 00:52:09 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>     messages in RECOVERY.
>     > ==>Costed 16 seconds to copy all old ring messages
>     >
>     > [logs from 192.168.0.142]
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>     ring messages from 42726-46693.
>     > Nov 30 00:51:56 corosync [TOTEM ] totemsrp.c:2145 Originated 2321
>     messages in RECOVERY.
>     > ==>Costed 3 seconds to copy all old ring messages
>     >
>     > [logs from 192.168.0.160]
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>     ring messages from 42726-46695.
>     > Nov 30 00:51:58 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>     messages in RECOVERY.
>     > ==>Costed 5 seconds to copy all old ring messages
>     >
>     > [192.168.0.164]
>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>     ring messages from 42726-46695.
>     > Nov 30 00:52:05 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>     messages in RECOVERY.
>     > ==>Costed 12 seconds to copy all old ring messages
>     >
>     > After the node(192.168.0.142) costed 3 seconds to copy all
>     messages, it start
>     > waiting for the commit_token from it's upstream(192.168.0.124),
>     but it could
>     > not received as it's upstream(192.168.0.124) was busy copying
>     messages at that
>     > time between 00:51:56 to 00:52:09. So this node(192.168.0.142)
>     would finally
>     > trigger token lost timeout in RECOVERY state.
>     >
>     > More seriously, after it triggered token lost timeout, the
>     node(192.168.0.142)
>     > enter GATHER state, but it could not receive JoinMSG from other
>     nodes because
>     > others was also busy at that time, and then this node ran into
>     consensus
>     > expired timeout state finally.
>     >
>     > To solve this issue, we can cancel token_timeout
>     > (but keeping token_retransmit_timeout) in RECOVERY state.
>     >
>     > Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx
>     <mailto:qiushu.zyk@xxxxxxxxxx>>
>     > ---
>     >  exec/totemsrp.c |    2 +-
>     >  1 files changed, 1 insertions(+), 1 deletions(-)
>     >
>     > diff --git a/exec/totemsrp.c b/exec/totemsrp.c
>     > index 5a78962..e0299e7 100644
>     > --- a/exec/totemsrp.c
>     > +++ b/exec/totemsrp.c
>     > @@ -2206,7 +2206,7 @@ originated:
>     >       instance->my_install_seq = SEQNO_START_MSG;
>     >       instance->last_released = SEQNO_START_MSG;
>     >
>     > -     reset_token_timeout (instance); // REVIEWED
>     > +     cancel_token_timeout (instance);
>     >       reset_token_retransmit_timeout (instance); // REVIEWED
>     >
>     >       instance->memb_state = MEMB_STATE_RECOVERY;
> 
>     _______________________________________________
>     discuss mailing list
>     discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
>     http://lists.corosync.org/mailman/listinfo/discuss
> 
> 

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss