Re: Starter Cluster / GFS

Gordan Bobic <gordan@xxxxxxxxxx> · Thu, 11 Nov 2010 10:07:31 +0000

Jankowski, Chris wrote:
Gordan,

I do understand the mechanism.  I was trying to gently point out that
this behaviour is unacceptable for my commercial IP customers. The customers
buy clusters for high availability. Loosing the whole cluster due to single
component failure - hearbeat link is not acceptable. The heartbeat link is
a huge SPOF. And the cluster design does not support redundant links for
heartbeat.

Also, none of the commercially available UNIX clusters or Linux clusters
(HP ServiceGuard, Veritas, SteelEye) would display this type of behaviour
and they do not clobber cluster filesystems.  So, it is possible to
achieve acceptable reaction to this type of failure.
My point was that you can easily overcome the race by introducing a 
staggered delay into fencing that works around the race condition.
I never tried, but are you sure bonded devices don't work for heartbeat?

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster