Re: [Linux-cluster] manual fencing problem

"Matthew B. Brookover" <mbrookov@xxxxxxxxx> · Tue, 14 Dec 2004 12:48:14 -0700

Try running fence_ack_manual on cl031a.  I believe the fence_manual.fifo is only created on the node that succeeded in fencing the downed member.

Matt

mbrookov@xxxxxxxxx

On Tue, 2004-12-14 at 12:18, Daniel McNeil wrote:

I was running my test last night and I got an i/o error
from the disk subsystem that caused one of the nodes to
panic.

The other 2 nodes removed the dead node from membership, but
the the fencing did not work.  

cl030 /var/log/messages:

Dec 13 21:54:26 cl030 kernel: CMAN: no HELLO from cl032a, removing from the cluster
Dec 13 21:54:27 cl030 fenced[12121]: fencing node "cl032a"
Dec 13 21:54:27 cl030 fenced[12121]: fence "cl032a" failed
Dec 13 21:54:28 cl030 fenced[12121]: fencing node "cl032a"
Dec 13 21:54:28 cl030 fenced[12121]: fence "cl032a" failed
Dec 13 21:54:29 cl030 fenced[12121]: fencing node "cl032a"

This goes on all night..

cl031 /var/log/messagew:
Dec 13 21:54:27 cl031 fenced[11850]: fencing deferred to 1

[root@cl030 root]#  fence_ack_manual -s cl032a

Warning:  If the node "cl032a" has not been manually fenced
(i.e. power cycled or disconnected from shared storage devices)
the GFS file system may become corrupted and all its data
unrecoverable!  Please verify that the node shown above has
been reset or disconnected from storage.

Are you certain you want to continue? [yN] y
can't open /tmp/fence_manual.fifo: No such file or directory

I've attached my cluster.conf file.

Do I have fencing set up correctly.  Any ideas on why
fenced is failing to fence?

Thanks,

Daniel

--

Linux-cluster@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/linux-cluster