Re: IP-based tie-breaker on a 2-node cluster?

gordan@xxxxxxxxxx · Thu, 17 Apr 2008 16:55:43 +0100 (BST)

On Thu, 17 Apr 2008, Andrew Lacey wrote:

I am doing some testing on a 2-node, active/standby RHEL 4 cluster with
non-GFS shared storage. I am using HP iLO for fencing. I don't have a
quorum disk set up. Both cluster nodes are connected to the same switch,
and that network path is used for cluster communication as well as general
network communication (including access to iLO). I've found that when the
switch goes down and comes back up, the result is not desirable. As soon
as the switch loses power, each node starts trying to fence the other.
Since the iLO is not reachable, this is unsuccessful, but the nodes keep
retrying the fence. When the switch comes back online, the "OK Corral"
scenario takes place -- both nodes fence each other simultaneously and
bring down the cluster.

I had a similar issue, but the solution I went for is doctoring the 
fencing agent to put in a delay based on node's priority in to the fencing 
daemon. That way the nodes wouldn't try to fence simultaneously, but in a 
staggered fashion.

If you have a spare NIC, and the nodes are next to each other, you could 
make them use a cross-over cable for their cluster communication, so they 
would notice that they are both still up even when the switch dies. That's 
what I do.

Gordan

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster