"Connected to cluster infrastructure via: CMAN/SM Plugin v1.1"
This will at least tell you that ccsd is able to connect to the cluster manager - and therefore, should know whether the cluster is quorate or not.
brassow
On Dec 10, 2004, at 4:16 PM, Matthew B. Brookover wrote:
On Fri, 2004-12-10 at 00:17, David Teigland wrote:This sounds similar to a problem I have if I run fence_tool without ccsd
running.
Check /proc/cluster/status while it's waiting to see if the cluster actually has quorum or not. Also, I've added some extra checking and debugging to fence_tool that should help narrow down where things are stuck. Please update from cvs and rebuild at least the stuff in cluster/fence; then use "fence_tool join -D".
Usually things get stuck talking to ccs when ccs/magma libraries are out
of sync, but this case sounds different.
Ok, I pulled the updates from CVS and rebuilt the code and the kernel.
On node fiveoften, fence_tool printed out some errors and exited with status 1. The errors are below.
On node fouroften, fence_tool did not print any messages and it did not exit. I am guessing that fence_tool did not exit because of a feature of the -D flag. Fence_tool did startup fenced on fouroften.
Ccsd started up and is running on both nodes.
According to /proc/cluster/status and nodes on both fouroften and fiveoften, the cluster is up and has quorum.
fence_tool printed these messages on node fiveoften: + fence_tool join -D fence_tool: cannot connect to ccs -111
fence_tool: wait for quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: get our node name fence_tool: connect to ccs fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111
Log entries on node fiveoften:
Dec 10 14:08:39 fiveoften kernel: Lock_Harness <CVS> (built Dec 10 2004
09:14:45) installed
Dec 10 14:08:39 fiveoften kernel: GFS <CVS> (built Dec 10 2004 09:14:04)
installed
Dec 10 14:08:39 fiveoften kernel: CMAN <CVS> (built Dec 10 2004
09:51:59) installed
Dec 10 14:08:39 fiveoften kernel: NET: Registered protocol family 30
Dec 10 14:08:39 fiveoften kernel: DLM <CVS> (built Dec 10 2004 09:52:25)
installed
Dec 10 14:08:39 fiveoften kernel: Lock_DLM (built Dec 10 2004 09:14:25)
installed
Dec 10 14:08:40 fiveoften kernel: CMAN: Waiting to join or form a
Linux-cluster
Dec 10 14:09:11 fiveoften kernel: CMAN: sending membership request
Dec 10 14:09:11 fiveoften kernel: CMAN: got node fouroften
Dec 10 14:09:11 fiveoften kernel: CMAN: quorum regained, resuming
activity
Dec 10 14:09:11 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:11 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:12 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:12 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:13 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:13 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:14 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:14 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:15 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:15 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:16 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:16 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:17 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:17 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:18 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:18 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:19 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:19 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
Dec 10 14:09:20 fiveoften ccsd[3391]: Cluster is not quorate. Refusing
connection.
Dec 10 14:09:20 fiveoften ccsd[3391]: Error while processing connect:
Connection refused
The logs stopped when fence_tool exited.
On node fiveoften, /proc/cluster/status and /proc/cluster/nodes contain:
[mbrookov@fiveoften ~]$ more /proc/cluster/status Protocol version: 4.0.1 Config version: 6 Cluster name: CSMTEST Cluster ID: 9374 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 0 Node addresses: 138.67.4.25
[mbrookov@fiveoften ~]$ more /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M fouroften 2 1 1 M fiveoften [mbrookov@fiveoften ~]$
On node fouroften, /proc/cluster/status and /proc/cluster/nodes contain:
[mbrookov@fouroften ~]$ more /proc/cluster/status Protocol version: 4.0.1 Config version: 6 Cluster name: CSMTEST Cluster ID: 9374 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 1 Node addresses: 138.67.4.24
[mbrookov@fouroften ~]$ more /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M fouroften 2 1 1 M fiveoften
Log entries on node fouroften:
Dec 10 14:08:36 fouroften kernel: Lock_Harness <CVS> (built Dec 10 2004
09:14:45) installed
Dec 10 14:08:36 fouroften kernel: GFS <CVS> (built Dec 10 2004 09:14:04)
installed
Dec 10 14:08:36 fouroften kernel: CMAN <CVS> (built Dec 10 2004
09:51:59) installed
Dec 10 14:08:36 fouroften kernel: NET: Registered protocol family 30
Dec 10 14:08:36 fouroften kernel: DLM <CVS> (built Dec 10 2004 09:52:25)
installed
Dec 10 14:08:36 fouroften kernel: Lock_DLM (built Dec 10 2004 09:14:25)
installed
Dec 10 14:08:37 fouroften kernel: CMAN: Waiting to join or form a
Linux-cluster
Dec 10 14:09:09 fouroften kernel: CMAN: forming a new cluster
Dec 10 14:09:09 fouroften kernel: CMAN: quorum regained, resuming
activity
Dec 10 14:09:09 fouroften kernel: CMAN: got node fiveoften
/etc/cluster/cluster.conf: <?xml version="1.0"?> <cluster name="CSMTEST" config_version="6">
<cman two_node="1" expected_votes="1"> </cman>
<clusternodes> <clusternode name="fouroften" votes="1"> <fence> <method name="cascade1"> <device name="human" ipaddr="fouroften"/> </method> </fence> </clusternode>
<clusternode name="fiveoften" votes="1"> <fence> <method name="cascade1"> <device name="human" ipaddr="fiveoften"/> </method> </fence> </clusternode>
</clusternodes>
<fencedevices> <fencedevice name="human" agent="fence_manual"/> </fencedevices>
</cluster>
Both nodes are running Fedora Core 3 with the 2.6.9 kernel from kernel.org.
Thanks for you time!
Matt mbrookov@xxxxxxxxx
-- Linux-cluster@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/linux-cluster