RE: DLM locks with 1 node on 2 node cluster

Zelikov_Mikhail@xxxxxxx · Mon, 28 Aug 2006 14:58:32 -0400

(resending since forgot to include linux-cluster@xxxxxxxxxx)
I am using manual fencing with gnbd fencing. Here is the tail on
/var/proc/messages:

Aug 28 14:17:06 bof227 fenced[2497]: bof226 not a cluster member after 0 sec
post_fail_delay Aug 28 14:17:06 bof227 kernel: CMAN: removing node bof226
from the cluster : Missed too many heartbeats Aug 28 14:17:06 bof227
fenced[2497]: fencing node "bof226"
Aug 28 14:17:06 bof227 fence_manual: Node bof226 needs to be reset before
recovery can procede.  Waiting for bof226 to rejoin the cluster or for
manual acknowledgement that it has been reset (i.e. fence_ack_manual -n
bof226)

************************ cluster.conf
<?xml version="1.0"?>
<cluster config_version="84" name="MZ_CLUSTER">
	<fence_daemon post_fail_delay="0" post_join_delay="3"/>
	<clusternodes>
		<clusternode name="bof227" votes="1">
			<fence>
				<method name="1">
					<device name="device_MF_227"
nodename="bof227"/>
					<device name="gnbd_server_bof226"
nodename="bof227"/>
				</method>
			</fence>
		</clusternode>
		<clusternode name="bof226" votes="1">
			<fence>
				<method name="1">
					<device name="device_MF_226"
nodename="bof226"/>
					<device name="gnbd_server_bof227"
nodename="bof226"/>
				</method>
			</fence>
		</clusternode>
	</clusternodes>
	<cman expected_votes="1" two_node="1"/>
	<fencedevices>
		<fencedevice agent="fence_manual" name="device_MF_226"/>
		<fencedevice agent="fence_manual" name="device_MF_227"/>
		<fencedevice agent="fence_gnbd" name="gnbd_server_bof226"
servers="bof226"/>
		<fencedevice agent="fence_gnbd" name="gnbd_server_bof227"
servers="bof227"/>
	</fencedevices>
	<rm>
		<failoverdomains>
			<failoverdomain name="FD_PREF_BOF226" ordered="1"
restricted="1">
				<failoverdomainnode name="bof226"
priority="1"/>
				<failoverdomainnode name="bof227"
priority="2"/>
			</failoverdomain>
			<failoverdomain name="FD_PREF_BOF_227" ordered="1"
restricted="1">
				<failoverdomainnode name="bof227"
priority="1"/>
				<failoverdomainnode name="bof226"
priority="2"/>
			</failoverdomain>
		</failoverdomains>
		<resources/>
	</rm>
</cluster> 

-----Original Message-----
From: David Teigland [mailto:teigland@xxxxxxxxxx] 
Sent: Monday, August 28, 2006 2:36 PM
To: Zelikov, Mikhail
Cc: linux-cluster@xxxxxxxxxx
Subject: Re:  DLM locks with 1 node on 2 node cluster

It's trying to fence the failed node and won't continue with recovery until
that's done.  What fencing method are you using in cluster.conf?
Are there any fencing error messages in /var/log/messages?  What does your
cluster.conf look like?

Dave

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster