Dave, I guess we are confused here by "the failed node is actually reset" - does this mean that "the system is down/has been shutdown" or does this mean "the system has been rebooted and now is up and running"? In the first case I am getting errors in /var/log/messages in the second I do not need to do anything since the cluster will recover by itself. Mike -----Original Message----- From: David Teigland [mailto:teigland@xxxxxxxxxx] Sent: Monday, August 28, 2006 2:52 PM To: Zelikov, Mikhail Subject: Re: DLM locks with 1 node on 2 node cluster On Mon, Aug 28, 2006 at 02:52:48PM -0400, Zelikov_Mikhail@xxxxxxx wrote: > I am using manual fencing with gnbd fencing. Here is the tail on > /var/proc/messages: > > Aug 28 14:17:06 bof227 fenced[2497]: bof226 not a cluster member after > 0 sec post_fail_delay Aug 28 14:17:06 bof227 kernel: CMAN: removing > node bof226 from the cluster : > Missed too many heartbeats > Aug 28 14:17:06 bof227 fenced[2497]: fencing node "bof226" > Aug 28 14:17:06 bof227 fence_manual: Node bof226 needs to be reset > before recovery can procede. Waiting for bof226 to rejoin the cluster > or for manual acknowledgement that it has been reset (i.e. > fence_ack_manual -n > bof226) Follow what the message says: - make sure the failed node is actually reset, then - run "fence_ack_manual -n bof226" on the remaining node then recovery will continue. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster