Re: Why my cluster stop to work when one node down?

gordan@xxxxxxxxxx · Wed, 2 Apr 2008 16:16:16 +0100 (BST)

Replace:

<cman expected_votes="1">
</cman>

with

<cman two_node="1" expected_votes="1"/>

in cluster.conf.

Gordan

On Wed, 2 Apr 2008, Tiago Cruz wrote:

Hello guys,

I have one cluster with two machines, running RHEL 5.1 x86_64.
The Storage device has imported using GNDB and formated using GFS, to
mount on both nodes:

[root@teste-spo-la-v1 ~]# gnbd_import -v -l
Device name : cluster
----------------------
   Minor # : 0
sysfs name : /block/gnbd0
    Server : gnbdserv
      Port : 14567
     State : Open Connected Clear
  Readonly : No
   Sectors : 20971520

# gfs2_mkfs -p lock_dlm -t mycluster:export1 -j 2 /dev/gnbd/cluster
# mount /dev/gnbd/cluster /mnt/

Everything works graceful, until one node get out (shutdown, network
stop, xm destroy...)

teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering GATHER state from 0.
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Creating commit token because I am the rep.
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Saving state aru 46 high seq received 46
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Storing new sequence id for ring 4c
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering COMMIT state.
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering RECOVERY state.
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] position [0] member 10.25.0.251:
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] previous ring seq 72 rep 10.25.0.251
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] aru 46 high delivered 46 received flag 1
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Did not need to originate any messages in recovery.
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] Sending initial ORF token
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] CLM CONFIGURATION CHANGE
Apr  2 12:00:07 teste-spo-la-v1 kernel: dlm: closing connection to node 3
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] New Configuration:
Apr  2 12:00:07 teste-spo-la-v1 clurgmgrd[3557]: <emerg> #1: Quorum Dissolved
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.251)
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Left:
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.252)
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Joined:
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CMAN ] quorum lost, blocking activity
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] CLM CONFIGURATION CHANGE
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] New Configuration:
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] 	r(0) ip(10.25.0.251)
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Left:
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] Members Joined:
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [SYNC ] This node is within the primary component and will provide service.
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [TOTEM] entering OPERATIONAL state.
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CLM  ] got nodejoin message 10.25.0.251
Apr  2 12:00:07 teste-spo-la-v1 openais[1545]: [CPG  ] got joinlist message from node 2
Apr  2 12:00:12 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.
Apr  2 12:00:12 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
Apr  2 12:00:16 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.
Apr  2 12:00:17 teste-spo-la-v1 ccsd[1539]: Error while processing connect: Connection refused
Apr  2 12:00:22 teste-spo-la-v1 ccsd[1539]: Cluster is not quorate.  Refusing connection.

So then, my GFS mount point has broken... the terminal freeze when I try
to access the directory "/mnt" and just come back when the second node
has back again to the cluster.

Follow the cluster.conf:

<?xml version="1.0"?>
<cluster name="mycluster" config_version="2">

<cman expected_votes="1">
</cman>

<fence_daemon post_join_delay="60">
</fence_daemon>

<clusternodes>
<clusternode name="node1.mycluster.com" nodeid="2">
	<fence>
		<method name="single">
			<device name="gnbd" ipaddr="10.25.0.251"/>
		</method>
	</fence>
</clusternode>
<clusternode name="node2.mycluster.com" nodeid="3">
	<fence>
		<method name="single">
			<device name="gnbd" ipaddr="10.25.0.252"/>
		</method>
	</fence>
</clusternode>
</clusternodes>

<fencedevices>
	<fencedevice name="gnbd" agent="fence_gnbd"/>
</fencedevices>
</cluster>

Thanks!

--
Tiago Cruz
http://everlinux.com
Linux User #282636

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster