Re: Secondary IP addresses

Dan Frincu <df.cluster@xxxxxxxxx> · Tue, 22 May 2012 16:56:21 +0300

Hi,

On Tue, May 22, 2012 at 7:56 AM, Michael Chapman <mike@xxxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> I've been running several Corosync + Pacemaker clusters on Linux with great
> success for some time now. Recently, however, I encountered a rather tricky
> problem involving secondary IP addresses.
>
> This was the first time I had run Corosync 2.0.0 on a cluster where
> Pacemaker-managed floating IPs needed to be added to the same interfaces and
> subnet that were used by the Corosync traffic. Since a particular floating
> IP was in the same subnet as Corosync's configured bindnetaddr, on occasion
> Corosync would attempt to use this secondary IP rather than the interface's
> primary IP.
>
> Say I was running with this Corosync configuration:
>
>  totem {
>    ...
>    interface {
>      ringnumber:  0
>      bindnetaddr: 192.168.0.0
>      mcastaddr:   225.0.0.100
>      mcastport:   5405
>    }
>  }
>
> with a "cluster" interface with primary IP address 192.168.0.1/24. Later on,
> a Pacemaker-managed floating IP resource would be added to this interface,
> effectively running the command:
>
>  ip addr add 192.168.0.2/24 dev cluster label cluster:foo
>
> Corosync would erroneously start using the IP 192.168.0.2 for cluster
> traffic. This not only broke the cluster -- firewalling meant that Corosync
> traffic wasn't allowed on that IP -- it completely confused Pacemaker
> (suddenly a new node would appear!).
>
> I tracked this problem down as far as totem_getifaddrs in exec/totemip.c. It
> prepends IPs to a list, then (in totem_iface_check) the first matching IP in
> this list (i.e. the *last* matching IP in getifaddrs's order) is used.
>
> As a quick, very hacky workaround, I changed totem_getifaddrs to append IPs
> rather than prepend them, and for it to ignore IPs with a label containing a
> colon. Either of these would have been OK in my situation; I implemented
> both for good measure. This effectively worked around the problem on this
> cluster.
>
> So I have a few questions:
>
> 1. Under what conditions does Corosync re-evaluate the interfaces on the
> system to determine what IPs it should be using?
>
> 2. Would it be possible to have Corosync ignore "secondary" IP addresses?
> Would this even be a good idea? It looks like getifaddrs(), at least on
> Linux, doesn't expose anything that identifies a secondary interface, but
> the underlying netlink protocol does (an IFA_F_SECONDARY flag associated
> with the interface address).

Best choice would have been to use the actual IP address in the config
file, rather than using the network address. This would lead to the
same effect (bind on the right IP) without having to modify code. This
is also the recommended way of doing things when having overlapping
subnets. And, yes, it will always go for the highest numerical IP it
finds.

>
> 3. I understand the whole idea of using a bind "network" in corosync.conf is
> that the same config can then be used on all the machines in the cluster,
> but that approach doesn't work when a system can potentially have multiple
> IPs within the same subnet. Could perhaps Corosync bind to a *specific* IP
> on the system if that IP is specified in the config file? Or, since this is
> a slight incompatibility with how people might have used the configs before,
> should there be some other config file directive to do this?

See above.

HTH,
Dan

>
> Regards,
> Michael
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

-- 
Dan Frincu
CCNA, RHCE
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss