While the node is down (bof226) I do fence_ack_manual -n bof226. I start getting the following messages in the /var/log/messages: Aug 28 15:08:30 bof227 fence_manual: Node bof226 needs to be reset before recovery can procede. Waiting for bof226 to rejoin the cluster or for manual acknowledgement that it has been reset (i.e. fence_ack_manual -n bof226) Aug 28 15:10:33 bof227 ccsd[2433]: process_get: Invalid connection descriptor received. Aug 28 15:10:33 bof227 ccsd[2433]: Error while processing get: Invalid request descriptor Aug 28 15:10:33 bof227 fenced[2497]: fence "bof226" failed Aug 28 15:10:38 bof227 fenced[2497]: fencing node "bof226" Aug 28 15:10:38 bof227 ccsd[2433]: process_get: Invalid connection descriptor received. Aug 28 15:10:38 bof227 ccsd[2433]: Error while processing get: Invalid request descriptor Aug 28 15:10:38 bof227 fenced[2497]: fence "bof226" failed Aug 28 15:10:43 bof227 fenced[2497]: fencing node "bof226" Aug 28 15:10:43 bof227 ccsd[2433]: process_get: Invalid connection descriptor received. Aug 28 15:10:43 bof227 ccsd[2433]: Error while processing get: Invalid request descriptor Aug 28 15:10:43 bof227 fenced[2497]: fence "bof226" failed >>> Is there a special reason you're using both gnbd and manual fencing? I've never seen that done before and can't think of a reason you'd want to. I was under impression that if there is no hw fencing device then the manual one is required. It was also my understanding that if I use gnbd devices then an explicit gnbd fencing is required as well. Mike -----Original Message----- From: David Teigland [mailto:teigland@xxxxxxxxxx] Sent: Monday, August 28, 2006 3:04 PM To: Zelikov, Mikhail Cc: linux-cluster@xxxxxxxxxx Subject: Re: DLM locks with 1 node on 2 node cluster On Mon, Aug 28, 2006 at 02:58:32PM -0400, Zelikov_Mikhail@xxxxxxx wrote: > I am using manual fencing with gnbd fencing. Is there a special reason you're using both gnbd and manual fencing? I've never seen that done before and can't think of a reason you'd want to. (I'd just use gnbd, not manual.) That said, I suspect what you have configured should still work. > Here is the tail on /var/proc/messages: > > Aug 28 14:17:06 bof227 fenced[2497]: bof226 not a cluster member after > 0 sec post_fail_delay Aug 28 14:17:06 bof227 kernel: CMAN: removing > node bof226 from the cluster : Missed too many heartbeats Aug 28 > 14:17:06 bof227 > fenced[2497]: fencing node "bof226" > Aug 28 14:17:06 bof227 fence_manual: Node bof226 needs to be reset > before recovery can procede. Waiting for bof226 to rejoin the cluster > or for manual acknowledgement that it has been reset (i.e. > fence_ack_manual -n > bof226) Follow what the message says and run "fence_ack_manual -n bof226" on the remaining node after verifying the failed node has been reset or otherwise fenced. Dave -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster