Re: [PATCH v2] [TotemSRP] Ignore duplicated commit tokens in recovery mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



lgtm2 ;-)


Chrissie


On 15/01/15 12:21, Steven Dake wrote:
forgot to copy list.

Honza, lgtm.

Regards
-steve

On Thu, Jan 15, 2015 at 5:20 AM, Steven Dake <steven.dake@xxxxxxxxx
<mailto:steven.dake@xxxxxxxxx>> wrote:

    Honza,

    lgtm.

    regards
    -steve

    On Wed, Jan 14, 2015 at 10:19 AM, Jan Friesse <jfriesse@xxxxxxxxxx
    <mailto:jfriesse@xxxxxxxxxx>> wrote:

        Jason,
        patch looks good. This touches very delicate part of protocol, so I
        would really like to see also another reviewer comment.
        Chrissie, Steve?

        Regards,
           Honza


        jason napsal(a):
         > In active rrp mode, commit tokens are treated as mcast data
        messages,
         > thus, rrp directly delivers them to srp layer by
        active_mcast_recv().
         > This will result in duplicated commit tokens being received
        by srp
         > from different heartbeat links. If node is in recovery state
        and has
         > already sent out the initial orf token, those duplicated
        commit tokens
         > will cause message_handler_memb_commit_token() to send
        initial orf
         > token again! This is wrong because it resets the orf token
        content in
         > instance->orf_token_retransmit, which breaks the token
        retransmission
         > state.
         >
         > Furthermore, by sending those initial orf tokens again and
        again, it
         > may lead active_token_recv() to drop some subsequent orf
        tokens. It is
         > OK for rrp because srp will do token retransmission, but as said
         > above, srp retransmission state has already been broken, so
        finally we
         > meet a "token lost in recovery state" condition caused by
        software. If
         > token timeout value is large, then it will takes long time to
        create a
         > new ring.
         >
         > This can be reproduced by having two noded set to active rrp
        mode,
         > with two heartbeat links. Then with one node always on, let
        the other
         > one do stop/start again and again. It has a low probability to
         > reproduce. In theory, I think, the more heartbeat links used,
        the more
         > easily it can be reproduced.
         >
         > This problem can be resolved by letting
         > message_handler_memb_commit_token() to ignore duplicated
        commit tokens
         > in recovery state if node (the ring representation) has
        already sent
         > out the initial orf token.
         >
         > Different from prev take, this version do not depends on
        stored token
         > data but uses originated_orf_token in totemsrp_instance to
        remember if
         > initial orf token has been already originated for current
        membership.
         >
         >
         >
         >
         > _______________________________________________
         > discuss mailing list
         > discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
         > http://lists.corosync.org/mailman/listinfo/discuss
         >

        _______________________________________________
        discuss mailing list
        discuss@xxxxxxxxxxxx <mailto:discuss@xxxxxxxxxxxx>
        http://lists.corosync.org/mailman/listinfo/discuss





_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux