Re: can the changing node identity returned by local_get be handled reliably?

dan clark <2clarkd@xxxxxxxxx> · Tue, 17 Apr 2012 14:02:51 -0700

Hi Jan!

I agree that it is difficult for applications to handle the corosync daemons changing identity from a bound interface to loopback and then back to the same interface.

I wonder what the consequence of not using the multicast loop for echoing back the local message might be for both message delivery guarantees and timing of message delivery.

First message delivery considerations.
In the current design is it the case that the message is delivered to all receivers in the group if an only if the message is reflected back to the sender?  For example, given a pre-condition of a group size of three and a sender on node 1 of nodes 1,2,3.  If a message is sent from node 1 right when a remote process goes down on node 3, will message arrival order guarantee how the message was delivered:

scenario 1: assume 3 got the message
message reflected,
group membership changes
scenario 2: assume 3 did not get message but 1 & 2 did.
group membership change received
message reflected
scenario 3: message send fails (doesn't happen?)

Second timing considerations.
Say it would be interesting to avoid the flood of messages which occurs during high level debugging of message delivery in corosync.  Instead it might be replaced with a message delivery historgram which counts which bucket a message gets delivered into (0-64ms, 64-128ms, 128-256ms, 256-1024ms, 1024-4096ms, 4096-16sec, > 16sec, not delivered)  Could the sender get a bound on the time for all messages delivered by only timing the reflected message?  

Clearly, simply calling back with the sent message might reduce latency on the reflection of the message, but would it accurately reflect the guarantees and message ordering of the current system, or is that already not a safe assumption?

I understand the idea of not rebinding.  Does not rebinding allow the local applications to continue to use their groups internally?  How would not rebinding impact nodes coming back together?    If an interface changes to a new IP address on a different subnet how 
would this be handled?  (recommend a full restart?)  How is it detected and reported by corosync?

Thank you for discussing the issues.

I realize now that the modified testcpg.c attachement was scrubbed.  Is there a desired location for the upload?

dan

On Fri, Apr 13, 2012 at 1:51 AM, Jan Friesse <jfriesse@xxxxxxxxxx> wrote:

Dan,

there are two problems I see with current corosync.

1.) Localhost rebinding

2.) Rely on kernel multicast loop facility

My opinion on them is simple. Both must go and must be replaced by:

1.) Don't use multicast loop. Move message directly from send function to receive function for local node

2.) Never rebind

It's really impossible for application authors to handle this "change of identity" behavior.

And solution for both problems are on top place of my TODO list (so expect them in 2.1 and backported to flatiron).

Regards,

  Honza

dan clark napsal(a):

Hi Folks!

Thank you Christine for a well written test application and leading the way

with the apropos comment "NOTE: in reality we should also check the

nodeid". Some comments are more easily addressed then others!

During various failure tests a variety of conditions seem to trigger a

client application of the cpg library to change it's node identity.  I know

this has been discussed under various guises with respect to the proper way

to fail a network (don't ifdown/ifup and interface.  Oddly enough, however,

a common dynamic reconfiguration step on a node is to do a 'service network

restart' which tends to do ifdown/ifup on interfaces.  Designing

applications to be resilient to common failures is often desirable,

including the restart of a service (such as corosync) so I have included a

slightly modified version of testcpg.c that provides such resiliency.  I

wonder, however, if the nature of the changing identity of the node

information returned from cpg_local_get can be relied on across versions or

if this is abhorrent or transient behaviour that might change?   Note, that

once the node identity has changed if an application continues to maintain

use of a group then once the cluster is reformed that group is isolated

from other groups, despite sharing a common name.  Furthermore there are

impacts on other applications on the isolated node that might share the use

of that group.

On a separate note, is there a way to change the seemingly fixed 20-30

second delay upon the daemons re-joining a cluster separated due to

isolated network conditions (power cycling a switch for example)?

Note the following output indicating the first realization of a node that

the configuration has changed and how the local_get indicates that the node

identity is now (127.0.0.1) as opposed to the original value which was

(4.0.0.8).  Tests performed on version 1.4.2.

 dcube-8:28506 2012-04-09 14:02:06.803

ConfchgCallback: group 'GROUP'

Local node id is 127.0.0.1/100007f result 1

left node/pid 4.0.0.2/15796 reason: 3

nodes in group now 3

node/pid 4.0.0.4/2655

node/pid 4.0.0.6/4238

node/pid 4.0.0.8/28506

....

Finally even though the reported identity is loopback, the original id is

matched due to the static cache from the time of the join.  Is there a race

condition, however, that just after the join if there is a network failure

that the identity might change before the initialization logic is complete

and thus even the modified sample program is open to a failure?

dcube-8:28506 2012-04-09 14:02:06.803

ConfchgCallback: group 'GROUP'

Local node id is 127.0.0.1/100007f result 1

left node/pid 4.0.0.8/28506 reason: 3

nodes in group now 0

We might have left the building pid 28506

We probably left the building switched identity? start nodeid 134217732

nodeid 134217732 current nodeid 16777343 pid 28506

We have left the building direct match start nodeid 134217732 nodeid

134217732 local get current nodeid 16777343 pid 28506

Perhaps the test application for the release could be updated to include

appropriate testing for the nodeid?

Dan

------------------------------------------------------------------------

_______________________________________________

discuss mailing list

discuss@xxxxxxxxxxxx

http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss