Re: can the changing node identity returned by local_get be handled reliably?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jan!

I agree that it is difficult for applications to handle the corosync daemons changing identity from a bound interface to loopback and then back to the same interface.

I wonder what the consequence of not using the multicast loop for echoing back the local message might be for both message delivery guarantees and timing of message delivery.

First message delivery considerations.
In the current design is it the case that the message is delivered to all receivers in the group if an only if the message is reflected back to the sender?  For example, given a pre-condition of a group size of three and a sender on node 1 of nodes 1,2,3.  If a message is sent from node 1 right when a remote process goes down on node 3, will message arrival order guarantee how the message was delivered:
scenario 1: assume 3 got the message
message reflected,
group membership changes
scenario 2: assume 3 did not get message but 1 & 2 did.
group membership change received
message reflected
scenario 3: message send fails (doesn't happen?)

Second timing considerations.
Say it would be interesting to avoid the flood of messages which occurs during high level debugging of message delivery in corosync.  Instead it might be replaced with a message delivery historgram which counts which bucket a message gets delivered into (0-64ms, 64-128ms, 128-256ms, 256-1024ms, 1024-4096ms, 4096-16sec, > 16sec, not delivered)  Could the sender get a bound on the time for all messages delivered by only timing the reflected message? 
Clearly, simply calling back with the sent message might reduce latency on the reflection of the message, but would it accurately reflect the guarantees and message ordering of the current system, or is that already not a safe assumption?

I understand the idea of not rebinding.  Does not rebinding allow the local applications to continue to use their groups internally?  How would not rebinding impact nodes coming back together?    If an interface changes to a new IP address on a different subnet how would this be handled?  (recommend a full restart?)  How is it detected and reported by corosync?

Thank you for discussing the issues.

I realize now that the modified testcpg.c attachement was scrubbed.  Is there a desired location for the upload?

dan


On Fri, Apr 13, 2012 at 1:51 AM, Jan Friesse <jfriesse@xxxxxxxxxx> wrote:
Dan,
there are two problems I see with current corosync.
1.) Localhost rebinding
2.) Rely on kernel multicast loop facility

My opinion on them is simple. Both must go and must be replaced by:
1.) Don't use multicast loop. Move message directly from send function to receive function for local node
2.) Never rebind

It's really impossible for application authors to handle this "change of identity" behavior.

And solution for both problems are on top place of my TODO list (so expect them in 2.1 and backported to flatiron).

Regards,
 Honza

dan clark napsal(a):
Hi Folks!

Thank you Christine for a well written test application and leading the way
with the apropos comment "NOTE: in reality we should also check the
nodeid". Some comments are more easily addressed then others!

During various failure tests a variety of conditions seem to trigger a
client application of the cpg library to change it's node identity.  I know
this has been discussed under various guises with respect to the proper way
to fail a network (don't ifdown/ifup and interface.  Oddly enough, however,
a common dynamic reconfiguration step on a node is to do a 'service network
restart' which tends to do ifdown/ifup on interfaces.  Designing
applications to be resilient to common failures is often desirable,
including the restart of a service (such as corosync) so I have included a
slightly modified version of testcpg.c that provides such resiliency.  I
wonder, however, if the nature of the changing identity of the node
information returned from cpg_local_get can be relied on across versions or
if this is abhorrent or transient behaviour that might change?   Note, that
once the node identity has changed if an application continues to maintain
use of a group then once the cluster is reformed that group is isolated
from other groups, despite sharing a common name.  Furthermore there are
impacts on other applications on the isolated node that might share the use
of that group.

On a separate note, is there a way to change the seemingly fixed 20-30
second delay upon the daemons re-joining a cluster separated due to
isolated network conditions (power cycling a switch for example)?

Note the following output indicating the first realization of a node that
the configuration has changed and how the local_get indicates that the node
identity is now (127.0.0.1) as opposed to the original value which was
(4.0.0.8).  Tests performed on version 1.4.2.

 dcube-8:28506 2012-04-09 14:02:06.803
ConfchgCallback: group 'GROUP'
Local node id is 127.0.0.1/100007f result 1
left node/pid 4.0.0.2/15796 reason: 3
nodes in group now 3
node/pid 4.0.0.4/2655
node/pid 4.0.0.6/4238
node/pid 4.0.0.8/28506
....

Finally even though the reported identity is loopback, the original id is
matched due to the static cache from the time of the join.  Is there a race
condition, however, that just after the join if there is a network failure
that the identity might change before the initialization logic is complete
and thus even the modified sample program is open to a failure?

dcube-8:28506 2012-04-09 14:02:06.803
ConfchgCallback: group 'GROUP'
Local node id is 127.0.0.1/100007f result 1
left node/pid 4.0.0.8/28506 reason: 3
nodes in group now 0
We might have left the building pid 28506
We probably left the building switched identity? start nodeid 134217732
nodeid 134217732 current nodeid 16777343 pid 28506
We have left the building direct match start nodeid 134217732 nodeid
134217732 local get current nodeid 16777343 pid 28506

Perhaps the test application for the release could be updated to include
appropriate testing for the nodeid?

Dan



------------------------------------------------------------------------

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux