Re: [PATCH v2] [TotemSRP] Ignore duplicated commit tokens in recovery mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Perfect. So patch is in master as 4ee84c51fa73c4ec7cbee922111a140a3aaf75df

Thanks for patch and reviews.

Regards,
  Honza

> lgtm2 ;-)
> 
> 
> Chrissie
> 
> 
> On 15/01/15 12:21, Steven Dake wrote:
>> forgot to copy list.
>>
>> Honza, lgtm.
>>
>> Regards
>> -steve
>>
>> On Thu, Jan 15, 2015 at 5:20 AM, Steven Dake <steven.dake@xxxxxxxxx
>> <mailto:steven.dake@xxxxxxxxx>> wrote:
>>
>>     Honza,
>>
>>     lgtm.
>>
>>     regards
>>     -steve
>>
>>     On Wed, Jan 14, 2015 at 10:19 AM, Jan Friesse <jfriesse@xxxxxxxxxx
>>     <mailto:jfriesse@xxxxxxxxxx>> wrote:
>>
>>         Jason,
>>         patch looks good. This touches very delicate part of protocol,
>> so I
>>         would really like to see also another reviewer comment.
>>         Chrissie, Steve?
>>
>>         Regards,
>>            Honza
>>
>>
>>         jason napsal(a):
>>          > In active rrp mode, commit tokens are treated as mcast data
>>         messages,
>>          > thus, rrp directly delivers them to srp layer by
>>         active_mcast_recv().
>>          > This will result in duplicated commit tokens being received
>>         by srp
>>          > from different heartbeat links. If node is in recovery state
>>         and has
>>          > already sent out the initial orf token, those duplicated
>>         commit tokens
>>          > will cause message_handler_memb_commit_token() to send
>>         initial orf
>>          > token again! This is wrong because it resets the orf token
>>         content in
>>          > instance->orf_token_retransmit, which breaks the token
>>         retransmission
>>          > state.
>>          >
>>          > Furthermore, by sending those initial orf tokens again and
>>         again, it
>>          > may lead active_token_recv() to drop some subsequent orf
>>         tokens. It is
>>          > OK for rrp because srp will do token retransmission, but as
>> said
>>          > above, srp retransmission state has already been broken, so
>>         finally we
>>          > meet a "token lost in recovery state" condition caused by
>>         software. If
>>          > token timeout value is large, then it will takes long time to
>>         create a
>>          > new ring.
>>          >
>>          > This can be reproduced by having two noded set to active rrp
>>         mode,
>>          > with two heartbeat links. Then with one node always on, let
>>         the other
>>          > one do stop/start again and again. It has a low probability to
>>          > reproduce. In theory, I think, the more heartbeat links used,
>>         the more
>>          > easily it can be reproduced.
>>          >
>>          > This problem can be resolved by letting
>>          > message_handler_memb_commit_token() to ignore duplicated
>>         commit tokens
>>          > in recovery state if node (the ring representation) has
>>         already sent
>>          > out the initial orf token.
>>          >
>>          > Different from prev take, this version do not depends on
>>         stored token
>>          > data but uses originated_orf_token in totemsrp_instance to
>>         remember if
>>          > initial orf token has been already originated for current
>>         membership.
>>          >
>>          >
>>          >
>>          >
>>          > _______________________________________________
>>          > discuss mailing list
>>          > discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
>>          > http://lists.corosync.org/mailman/listinfo/discuss
>>          >
>>
>>         _______________________________________________
>>         discuss mailing list
>>         discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
>>         http://lists.corosync.org/mailman/listinfo/discuss
>>
>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx
>> http://lists.corosync.org/mailman/listinfo/discuss
>>
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux