On Wed, 2007-01-10 at 15:16 -0500, Josef Whiter wrote: > > Thanks for any advice ... > > > > This isn't a bug, its working as expected. What you need in qdisk, set it up > with the proper hueristics and it will force the shutdown of the bad node before > the bad node has a chance to fence off the working node. What he said. With qdisk, you can have the node declare itself unfit for cluster operation when bond0 or bond1 loses link; something like: <quorumd min_score="2" votes="2" status_file="/tmp/qdisk_status_info"> <heuristic program="ping -c1 -t1 <bond0 router>" score="1" interval="2"/> <heuristic program="ping -c1 -t1 <bond1 router>" score="1" interval="2"/> </quorumd> You could use more complex link monitoring (like the stuff in /usr/share/cluster/ip.sh) if you wanted, but this gives you the basic idea. The idea here is that if bond0 *or* bond1 loses link, qdiskd declares the node unfit (min_score = 2, and each route is 1 point, so loss of either => fatal). A feature was added after the initial release of qdiskd to reboot the node on loss of required score (previously, it would cause the node to become inquorate and block activity). -- Lon
Attachment:
signature.asc
Description: This is a digitally signed message part
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster