Re: [PATCH v2] Fixed bug: Corosync may trigger token lost timeout in RECOVERY state

Yunkai Zhang <qiushu.zyk@xxxxxxxxxx> · Thu, 1 Dec 2011 10:16:40 +0800

On Thu, Dec 1, 2011 at 5:39 AM, Steven Dake <sdake@xxxxxxxxxx> wrote:
> On 11/30/2011 02:35 PM, John Thompson wrote:
>> On Thu, Dec 1, 2011 at 7:19 AM, Steven Dake <sdake@xxxxxxxxxx
>> <mailto:sdake@xxxxxxxxxx>> wrote:
>>
>>     The analysis looks good but solution looks wrong.
>>
>>     If token is lost (the previous ring id fails for some reason) on the
>>     node that cancels the token timeout, the protocol will freeze.
>>

I know that this solution is not perfect. I'll consider more.

>>     this part from your logs doesn't look correct:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old ring
>>     messages from 42726-46695.
>>     > Nov 30 00:52:06 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>>     messages in RECOVERY.
>>     > ==>Costed 13 seconds to copy all old ring messages
>>
>>     At most 3969 messages would be originated, not 16240 messages.  Perhaps
>>     the loop is operating incorrectly?  Can you print out range and
>>     messages_originated at line 2111 with your test case?  One of these
>>     values may be getting corrupted by an errant memcpy.
>>
>>
>> The message numbers printed here are in hex and so the difference is
>> 16240 between them
>>
>
> Thanks for pointing that out. :)
>

Yes, the message numbers are in hex :)

> Regards
> -steve
>> Regards,
>> John
>>
>>     Insepcting the code the memset and memcpy operations look correct in
>>     size in the range loop (although the previous could could have an errant
>>     memcpy).
>>
>>     Regards
>>     -steve
>>
>>
>>     On 11/30/2011 08:20 AM, Yunkai Zhang wrote:
>>     > I found that corosync may trigger token lost timeout in RECOVERY
>>     state.
>>     > The reason is that different nodes may spend different time to
>>     copy old
>>     > ring messages from its regular_sort_queue to retrans_message_queue in
>>     > *memb_state_recovery_enter* function.
>>     >
>>     > Let's have a look at the following logs(don't care about the line
>>     numbers,
>>     > I have modify the source code for debugging):
>>     >
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2034 my_memb:5,
>>     new_memb:5, trans_memb:5
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [0] member
>>     192.168.0.92 <http://192.168.0.92>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [1] member
>>     192.168.0.124 <http://192.168.0.124>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [2] member
>>     192.168.0.142 <http://192.168.0.142>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [3] member
>>     192.168.0.160 <http://192.168.0.160>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2037 TRANS [4] member
>>     192.168.0.164 <http://192.168.0.164>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [0]
>>     member 192.168.0.92 <http://192.168.0.92>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>>     seq 1161924 rep 192.168.0.92
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high
>>     delivered 46695 received flag 1
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [1]
>>     member 192.168.0.124 <http://192.168.0.124>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>>     seq 1161924 rep 192.168.0.92
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high
>>     delivered 46695 received flag 1
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [2]
>>     member 192.168.0.142 <http://192.168.0.142>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>>     seq 1161924 rep 192.168.0.92
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 42725 high
>>     delivered 42725 received flag 0
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [3]
>>     member 192.168.0.160 <http://192.168.0.160>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>>     seq 1161924 rep 192.168.0.92
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high
>>     delivered 46695 received flag 1
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2041 position [4]
>>     member 192.168.0.164 <http://192.168.0.164>:
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2045 previous ring
>>     seq 1161924 rep 192.168.0.92
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2051 aru 46695 high
>>     delivered 46695 received flag 1
>>     >
>>     > We can see that there was five nodes forming a new ring at
>>     00:51:53, they had
>>     > different aru value and node(192.168.0.142) had the smallest aru
>>     value.
>>     >
>>     > Here is some logs comes from these five nodes, we can see how much
>>     time they
>>     > spend for copying old ring messages:
>>     >
>>     > [logs from 192.168.0.92]
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>>     ring messages from 42726-46695.
>>     > Nov 30 00:52:06 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>>     messages in RECOVERY.
>>     > ==>Costed 13 seconds to copy all old ring messages
>>     >
>>     > [logs from 192.168.0.124]
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>>     ring messages from 42726-46695.
>>     > Nov 30 00:52:09 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>>     messages in RECOVERY.
>>     > ==>Costed 16 seconds to copy all old ring messages
>>     >
>>     > [logs from 192.168.0.142]
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>>     ring messages from 42726-46693.
>>     > Nov 30 00:51:56 corosync [TOTEM ] totemsrp.c:2145 Originated 2321
>>     messages in RECOVERY.
>>     > ==>Costed 3 seconds to copy all old ring messages
>>     >
>>     > [logs from 192.168.0.160]
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>>     ring messages from 42726-46695.
>>     > Nov 30 00:51:58 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>>     messages in RECOVERY.
>>     > ==>Costed 5 seconds to copy all old ring messages
>>     >
>>     > [192.168.0.164]
>>     > Nov 30 00:51:53 corosync [TOTEM ] totemsrp.c:2111 copying all old
>>     ring messages from 42726-46695.
>>     > Nov 30 00:52:05 corosync [TOTEM ] totemsrp.c:2145 Originated 16240
>>     messages in RECOVERY.
>>     > ==>Costed 12 seconds to copy all old ring messages
>>     >
>>     > After the node(192.168.0.142) costed 3 seconds to copy all
>>     messages, it start
>>     > waiting for the commit_token from it's upstream(192.168.0.124),
>>     but it could
>>     > not received as it's upstream(192.168.0.124) was busy copying
>>     messages at that
>>     > time between 00:51:56 to 00:52:09. So this node(192.168.0.142)
>>     would finally
>>     > trigger token lost timeout in RECOVERY state.
>>     >
>>     > More seriously, after it triggered token lost timeout, the
>>     node(192.168.0.142)
>>     > enter GATHER state, but it could not receive JoinMSG from other
>>     nodes because
>>     > others was also busy at that time, and then this node ran into
>>     consensus
>>     > expired timeout state finally.
>>     >
>>     > To solve this issue, we can cancel token_timeout
>>     > (but keeping token_retransmit_timeout) in RECOVERY state.
>>     >
>>     > Signed-off-by: Yunkai Zhang <qiushu.zyk@xxxxxxxxxx
>>     <mailto:qiushu.zyk@xxxxxxxxxx>>
>>     > ---
>>     >  exec/totemsrp.c |    2 +-
>>     >  1 files changed, 1 insertions(+), 1 deletions(-)
>>     >
>>     > diff --git a/exec/totemsrp.c b/exec/totemsrp.c
>>     > index 5a78962..e0299e7 100644
>>     > --- a/exec/totemsrp.c
>>     > +++ b/exec/totemsrp.c
>>     > @@ -2206,7 +2206,7 @@ originated:
>>     >       instance->my_install_seq = SEQNO_START_MSG;
>>     >       instance->last_released = SEQNO_START_MSG;
>>     >
>>     > -     reset_token_timeout (instance); // REVIEWED
>>     > +     cancel_token_timeout (instance);
>>     >       reset_token_retransmit_timeout (instance); // REVIEWED
>>     >
>>     >       instance->memb_state = MEMB_STATE_RECOVERY;
>>
>>     _______________________________________________
>>     discuss mailing list
>>     discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
>>     http://lists.corosync.org/mailman/listinfo/discuss
>>
>>
>
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

-- 
Yunkai Zhang
Work at Taobao
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss