Re: RH Cluster doesn't pass basic acceptance tests - bug in fenced?

Lon Hohberger <lhh@xxxxxxxxxx> · Thu, 11 Jan 2007 09:30:08 -0500

On Wed, 2007-01-10 at 15:16 -0500, Josef Whiter wrote:

> > Thanks for any advice ...
> > 
> 
> This isn't a bug, its working as expected.  What you need in qdisk, set it up
> with the proper hueristics and it will force the shutdown of the bad node before
> the bad node has a chance to fence off the working node.

What he said.  With qdisk, you can have the node declare itself unfit
for cluster operation when bond0 or bond1 loses link; something like:

<quorumd min_score="2" votes="2" status_file="/tmp/qdisk_status_info">
   <heuristic program="ping -c1 -t1 <bond0 router>" score="1"
interval="2"/>
   <heuristic program="ping -c1 -t1 <bond1 router>" score="1"
interval="2"/>
</quorumd>

You could use more complex link monitoring (like the stuff
in /usr/share/cluster/ip.sh) if you wanted, but this gives you the basic
idea.

The idea here is that if bond0 *or* bond1 loses link, qdiskd declares
the node unfit (min_score = 2, and each route is 1 point, so loss of
either => fatal).  A feature was added after the initial release of
qdiskd to reboot the node on loss of required score (previously, it
would cause the node to become inquorate and block activity).

-- Lon
Attachment:
signature.asc

Description: This is a digitally signed message part
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster