Secondary IP addresses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've been running several Corosync + Pacemaker clusters on Linux with great success for some time now. Recently, however, I encountered a rather tricky problem involving secondary IP addresses.

This was the first time I had run Corosync 2.0.0 on a cluster where Pacemaker-managed floating IPs needed to be added to the same interfaces and subnet that were used by the Corosync traffic. Since a particular floating IP was in the same subnet as Corosync's configured bindnetaddr, on occasion Corosync would attempt to use this secondary IP rather than the interface's primary IP.

Say I was running with this Corosync configuration:

  totem {
    ...
    interface {
      ringnumber:  0
      bindnetaddr: 192.168.0.0
      mcastaddr:   225.0.0.100
      mcastport:   5405
    }
  }

with a "cluster" interface with primary IP address 192.168.0.1/24. Later on, a Pacemaker-managed floating IP resource would be added to this interface, effectively running the command:

  ip addr add 192.168.0.2/24 dev cluster label cluster:foo

Corosync would erroneously start using the IP 192.168.0.2 for cluster traffic. This not only broke the cluster -- firewalling meant that Corosync traffic wasn't allowed on that IP -- it completely confused Pacemaker (suddenly a new node would appear!).

I tracked this problem down as far as totem_getifaddrs in exec/totemip.c. It prepends IPs to a list, then (in totem_iface_check) the first matching IP in this list (i.e. the *last* matching IP in getifaddrs's order) is used.

As a quick, very hacky workaround, I changed totem_getifaddrs to append IPs rather than prepend them, and for it to ignore IPs with a label containing a colon. Either of these would have been OK in my situation; I implemented both for good measure. This effectively worked around the problem on this cluster.

So I have a few questions:

1. Under what conditions does Corosync re-evaluate the interfaces on the system to determine what IPs it should be using?

2. Would it be possible to have Corosync ignore "secondary" IP addresses? Would this even be a good idea? It looks like getifaddrs(), at least on Linux, doesn't expose anything that identifies a secondary interface, but the underlying netlink protocol does (an IFA_F_SECONDARY flag associated with the interface address).

3. I understand the whole idea of using a bind "network" in corosync.conf is that the same config can then be used on all the machines in the cluster, but that approach doesn't work when a system can potentially have multiple IPs within the same subnet. Could perhaps Corosync bind to a *specific* IP on the system if that IP is specified in the config file? Or, since this is a slight incompatibility with how people might have used the configs before, should there be some other config file directive to do this?

Regards,
Michael
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux