Yes, make sense :) I changed the cluster.conf not to include two fencing mechanisms: rather just manual (since I do not have any gnbd devices yet) ... and it worked :) So it might be (WARNING - speculation here) that a tmp file that is used for fencing is used by both manual and gndb fences and opened by one of them in the exclusive mode, so the other can not open it and wait on it ... You mentioned that the gndb fencing has multiple options: hw, manual ... I tried to change the configuration on my gndb fence resource via gui (system-config-cluster) and the only options are the name and the servers ... Mike -----Original Message----- From: David Teigland [mailto:teigland@xxxxxxxxxx] Sent: Monday, August 28, 2006 3:47 PM To: Zelikov, Mikhail Cc: linux-cluster@xxxxxxxxxx Subject: Re: DLM locks with 1 node on 2 node cluster On Mon, Aug 28, 2006 at 03:33:47PM -0400, Zelikov_Mikhail@xxxxxxx wrote: > Dave, I guess we are confused here by "the failed node is actually > reset" - does this mean that "the system is down/has been shutdown" or > does this mean "the system has been rebooted and now is up and > running"? In the first case I am getting errors in /var/log/messages > in the second I do not need to do anything since the cluster will recover by itself. The idea behind fence_manual is that you need to go and manually fence the failed machine somehow when you see that message. That means doing yourself what one of the normal fencing agents would otherwise do, e.g. power it off, disable its SAN connection. After you've done this, you run fence_ack_manual to tell the system that the failed node has been properly fenced (by you). If you reset the failed node, you just need to make sure the power is off before doing the ack command; you don't need to wait for it to be up and running again. If you reset the failed node and it comes back up and rejoins the cluster before you happen to run the fence_ack_manual command, then the fence_manual agent that's waiting on the non-failed node will recognize this and effectively do the fence_ack_manual step for you since it knows the failed node has been rebooted if it rejoins the cluster. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster