Re: Using ping as a heuristic not a great idea?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Same experience.

In addition, best practice is to make the quorum partition its own physical disk, and admittedly this is not best practice, but instead took a slice of SAN RAID so as not to waste a whole precious disk plus one for mirroring for what is essentially a 20MB partition - and fibre traffic would upset Qdiskd's vote.

Again, what I was doing was not best practice, and if a qdisk partition is needed it's definitely worth the two physical disks.

In conclusion, no, I wouldn't use ping heuristics.

Richard Rogerson wrote:
I've experienced the same thing. I have a two node cluster with
DRBD+GFS2 and during very high network activity I've had the node get
killed which caused GFS2 to lock up. I'd be very interested to see what
solution you come up with.

-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Don Hoover
Sent: Thursday, May 20, 2010 8:48 AM
To: linux-cluster@xxxxxxxxxx
Subject:  Using ping as a heuristic not a great idea?

I know its in a ton of the example cluster configs out there, but we
have had trouble whenever we try to use ping as a quorum heuristic.

What we have seen is whenever a large file transfer happens, the pings
start to get dropped by Linux, and the heuristic test starts to fail,
and the cluster node gets killed.

We have tried tweaking the interval, the count, the ttl, adding a -w
etc.. and nothing really keeps the cluster from killing nodes when they
get really busy network traffic.


I have personally had this heuristic work on small test environments,
but once we put into production on really bust workloads it pretty much
is useless.


It is a good idea in theory to use this because it would help ensure
that in a split cluster situation you would end up with the box which
had network connectivity would win over the one that did not. But...if
it causes your cluster to die periodically its not worth it.


Is this a known issue, but its just never mentioned in any of the
cluster setup examples?

Any one have a similar experience, or have any ideas on how to make it
work in a very busy cluster environment?


Also, this makes me wonder, if I have a two node cluster, with each node
getting 1 vote, the quorum getting 1 vote, and the heuristic getting 1
vote, but set the 'required' to only 2 votes, why would the heuristic
cause a loss of quorum since the node with the quorum disk alone would
have the needed two votes?


--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

  

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux