Re: Question about corosync versions...

Dan Frincu <df.cluster@xxxxxxxxx> · Wed, 23 Jan 2013 10:14:40 +0200

Hi,

On Wed, Jan 23, 2013 at 12:54 AM, Alan Robertson <alanr@xxxxxxx> wrote:
> Hi,
>
> I have a sudden need to configure corosync for clusters with the
> following characteristics - which I have no control over:
>     unicast
>     3-10 nodes
>     one unbonded 1G interface on one network
>     one unbonded 10G interface on a different network
>     Pacemaker
>     must work smoothly if any single network interface should fail.
>     Failover and failback must be automatic.
>
> From perusing the info from mailing lists via Google, it seems that some
> versions of Corosync might not do this correctly.
>
> What is the oldest version of corosync which is known to support such a
> configuration reliably?

Corosync 1.4.5, but you'd have to build from source as I haven't seen
any binaries around yet (I may be mistaken on this one). Also,
rrp_mode: passive is the more tested mode, so in terms of reliability,
that would be the recommended mode.

With different speed links (regardless of active/passive rrp_mode) it
will wait for the slower line. Some people have mentioned that the
slower speed line sometimes gets marked as faulty and autorecovers,
but this behavior hasn't been signaled by anyone in recent Corosync
1.4.x versions, that's why the recommendation for 1.4.5.

On the Pacemaker side, I guess it also depends on if you have any
shared resources such as DRBD. If the resources are stateless there
should be less worries about configuration. For interface failures
(and not only that) you have ocf:pacemaker:ping, if physical
interfaces' failure is a concern, and you need services that work on
top on a VIP address, you can add that to a loopback interface (this
is how I remember this being done in the Cisco world).

Failover and failback usually mean no explicit resource-stickiness has
been defined, and thus Pacemaker's movement of resources would be it
fails a resource to another available node, but when the node returns,
based on the returning node's lower hostname (as determined by
strncmp() ), the resource will failback to the original node. When
talking about 10 nodes, you'd have to get a little creative with
either location constraints or node attributes (or both) to make
resources failover and back to a specific set of nodes.

HTH,
Dan

>
> --
>     Alan Robertson <alanr@xxxxxxx> - @OSSAlanR
>
> "Openness is the foundation and preservative of friendship...  Let me claim from you at all times your undisguised opinions." - William Wilberforce
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

-- 
Dan Frincu
CCNA, RHCE
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss