Hi,
I've been running several Corosync + Pacemaker clusters on Linux with
great success for some time now. Recently, however, I encountered a rather
tricky problem involving secondary IP addresses.
This was the first time I had run Corosync 2.0.0 on a cluster where
Pacemaker-managed floating IPs needed to be added to the same interfaces
and subnet that were used by the Corosync traffic. Since a particular
floating IP was in the same subnet as Corosync's configured bindnetaddr,
on occasion Corosync would attempt to use this secondary IP rather than
the interface's primary IP.
Say I was running with this Corosync configuration:
totem {
...
interface {
ringnumber: 0
bindnetaddr: 192.168.0.0
mcastaddr: 225.0.0.100
mcastport: 5405
}
}
with a "cluster" interface with primary IP address 192.168.0.1/24. Later
on, a Pacemaker-managed floating IP resource would be added to this
interface, effectively running the command:
ip addr add 192.168.0.2/24 dev cluster label cluster:foo
Corosync would erroneously start using the IP 192.168.0.2 for cluster
traffic. This not only broke the cluster -- firewalling meant that
Corosync traffic wasn't allowed on that IP -- it completely confused
Pacemaker (suddenly a new node would appear!).
I tracked this problem down as far as totem_getifaddrs in exec/totemip.c.
It prepends IPs to a list, then (in totem_iface_check) the first matching
IP in this list (i.e. the *last* matching IP in getifaddrs's order) is
used.
As a quick, very hacky workaround, I changed totem_getifaddrs to append
IPs rather than prepend them, and for it to ignore IPs with a label
containing a colon. Either of these would have been OK in my situation; I
implemented both for good measure. This effectively worked around the
problem on this cluster.
So I have a few questions:
1. Under what conditions does Corosync re-evaluate the interfaces on the
system to determine what IPs it should be using?
2. Would it be possible to have Corosync ignore "secondary" IP addresses?
Would this even be a good idea? It looks like getifaddrs(), at least on
Linux, doesn't expose anything that identifies a secondary interface, but
the underlying netlink protocol does (an IFA_F_SECONDARY flag associated
with the interface address).
3. I understand the whole idea of using a bind "network" in corosync.conf
is that the same config can then be used on all the machines in the
cluster, but that approach doesn't work when a system can potentially have
multiple IPs within the same subnet. Could perhaps Corosync bind to a
*specific* IP on the system if that IP is specified in the config file?
Or, since this is a slight incompatibility with how people might have used
the configs before, should there be some other config file directive to do
this?
Regards,
Michael
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss