Re: Message mis-delievered to a configuration ?

sathya bettadapura <s_bettadapura@xxxxxxxxx> · Wed, 30 Oct 2013 14:08:30 -0700 (PDT)

I have not been able to reproduce the problem so far. I even introduced a delay in sending an internal message from one of the nodes in the Totem config. change callback to increase the window of opportunity for the message misdelivery to occur, but even with that it doesn't seem to happen.

However I did run into another problem. I noticed that there're times when you get a CPG config. change callback without a Totem Config. change callback. When multiple nodes join or leave a cluster, you get one CPG callback per member that left or joined. The way we determine if all CPG callbacks are in, is to compare the membership list from CPG to the one returned by Totem callback. It's only when they match that we conclude that there's has been a change in cluster membership. If we miss a Totem
 config. change here, the CPG callbacks are essentially ignored making a subsequent config. change to appear spurious. And as this happens on one set of nodes and not the other when a cluster splits into two, the two halves get out of sync.

For this one, I do have the fplay records from all nodes in the cluster. If someone wants to look at them, I can upload them.

Thanks,

      Sathya

     On Thursday, October 24, 2013 9:11 PM, sathya bettadapura <s_bettadapura@xxxxxxxxx> wrote:

  I'll try to reproduce it again tomorrow. It happened somewhat accidentally when I first noticed it. But now that I know what to look for in the logs when it happens, hopefully, I'll have better luck reproducing it.

Thanks,

    Sathya

     On Thursday, October 24, 2013 8:55 PM, Steven Dake <sdake@xxxxxxxxxx> wrote:

    On 10/24/2013 06:57 PM, sathya
      bettadapura wrote:

      Any state in
        application happens only upon receipt of a message (or a
        config.change), not merely queueing it via cpg_mcast(). Upon
        receipt of a config. change, internal messages are broadcast via
        cpg_mcast() as the last thing they do in handling  the callback.

        Thanks,

            Sathya

    Well that blows my theory.  Do you have a test case you can share?

    Regards

    -steve

                On Thursday,
                  October 24, 2013 5:51 PM, Steven Dake
                  <sdake@xxxxxxxxxx> wrote:

                    On
                      10/24/2013 03:31 PM, sathya bettadapura wrote:

                      Hi All,

                      I think I am noticing what appears to be an
                        anomaly, so just posting here for sanity check.

                      We have a stress test that does frequent
                        network partitioning/reunification to exercise
                        code related to node fail-back. We're based on
                        version 1.4.6. We were were based on 2.x.x until
                        libqb made its way into the core of corosync. As
                        our company policy precludes us from using
                        anything but BSD style licensed third part
                        source code, we had to either rewrite libqb or
                        go back.

                      Lets's say we have four nodes A, B, C and D.
                        A and B are one side of a network segment and C
                        and D on the other. The network can be
                        partitioned by pulling a cable connecting the
                        two segments.

                      When there is a configuration change, we need
                        to re-compute application state by sending
                        messages to the new members. Such a message
                        identifies the originating node and the size of
                        the cluster at the time. And this message is
                        logged in application log. 

                      When the cluster goes from A,B,C,D to (A,B)
                        and (C, D), on A-B side, we see message from A
                        that says "From A, cluster size is 2".
                        Immediately thereafter there's another config.
                        chage  to take the cluster back to (A, B, C, D).
                        Now we see messages from A, C and D that the
                        cluster size is 4. But we see two messages from
                        B, the first one says the cluster size is 2 and
                        the second one says it's 4. It appears that the
                        message from B when the cluster size was 2,
                        could not be delivered as there was a config.
                        change right on its heel, but it's being
                        delivered to a configuration different from the
                        one where it originated. Is this expected
                        behaviour ?

                    Messages are originated by the totem protocol and
                    ordered according to EVS when they are taken off the
                    new message queue and transmitted into the network. 
                    This is different then queing a message (via cpg),
                    which is not origination.  Are you sure your not
                    confusing origination with cpg_mcast?

                    Generally the correct way for an application to
                    behave according to EVS is to originate all state
                    change messages via the protocol, and act on them
                    when received.  Some devs tend to change state when
                    they used cpg_mcast rather then change state when a
                    message is delivered.  This would result in your
                    example behavior.

                    Just to clarify, your application only changes state
                    on delivery of a message to the cpg application (not
                    on queueing via cpg_mcast)?

                    Regards

                    -steve

                               Sathya

                        _______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss