Secondary IP addresses

Michael Chapman <mike@xxxxxxxxxxxxxxxxx> · Tue, 22 May 2012 14:56:21 +1000 (EST)

Hi,

I've been running several Corosync + Pacemaker clusters on Linux with 
great success for some time now. Recently, however, I encountered a rather 
tricky problem involving secondary IP addresses.

This was the first time I had run Corosync 2.0.0 on a cluster where 
Pacemaker-managed floating IPs needed to be added to the same interfaces 
and subnet that were used by the Corosync traffic. Since a particular 
floating IP was in the same subnet as Corosync's configured bindnetaddr, 
on occasion Corosync would attempt to use this secondary IP rather than 
the interface's primary IP.

Say I was running with this Corosync configuration:

  totem {
    ...
    interface {
      ringnumber:  0
      bindnetaddr: 192.168.0.0
      mcastaddr:   225.0.0.100
      mcastport:   5405
    }
  }

with a "cluster" interface with primary IP address 192.168.0.1/24. Later 
on, a Pacemaker-managed floating IP resource would be added to this 
interface, effectively running the command:

  ip addr add 192.168.0.2/24 dev cluster label cluster:foo

Corosync would erroneously start using the IP 192.168.0.2 for cluster 
traffic. This not only broke the cluster -- firewalling meant that 
Corosync traffic wasn't allowed on that IP -- it completely confused 
Pacemaker (suddenly a new node would appear!).

I tracked this problem down as far as totem_getifaddrs in exec/totemip.c. 
It prepends IPs to a list, then (in totem_iface_check) the first matching 
IP in this list (i.e. the *last* matching IP in getifaddrs's order) is 
used.

As a quick, very hacky workaround, I changed totem_getifaddrs to append 
IPs rather than prepend them, and for it to ignore IPs with a label 
containing a colon. Either of these would have been OK in my situation; I 
implemented both for good measure. This effectively worked around the 
problem on this cluster.

So I have a few questions:

1. Under what conditions does Corosync re-evaluate the interfaces on the 
system to determine what IPs it should be using?

2. Would it be possible to have Corosync ignore "secondary" IP addresses? 
Would this even be a good idea? It looks like getifaddrs(), at least on 
Linux, doesn't expose anything that identifies a secondary interface, but 
the underlying netlink protocol does (an IFA_F_SECONDARY flag associated 
with the interface address).

3. I understand the whole idea of using a bind "network" in corosync.conf 
is that the same config can then be used on all the machines in the 
cluster, but that approach doesn't work when a system can potentially have 
multiple IPs within the same subnet. Could perhaps Corosync bind to a 
*specific* IP on the system if that IP is specified in the config file? 
Or, since this is a slight incompatibility with how people might have used 
the configs before, should there be some other config file directive to do 
this?

Regards,
Michael
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss