On Fri, 2004-12-10 at 00:17, David Teigland wrote: > This sounds similar to a problem I have if I run fence_tool without ccsd > running. > > Check /proc/cluster/status while it's waiting to see if the cluster > actually has quorum or not. Also, I've added some extra checking and > debugging to fence_tool that should help narrow down where things are > stuck. Please update from cvs and rebuild at least the stuff in > cluster/fence; then use "fence_tool join -D". > > Usually things get stuck talking to ccs when ccs/magma libraries are out > of sync, but this case sounds different. Ok, I pulled the updates from CVS and rebuilt the code and the kernel. On node fiveoften, fence_tool printed out some errors and exited with status 1. The errors are below. On node fouroften, fence_tool did not print any messages and it did not exit. I am guessing that fence_tool did not exit because of a feature of the -D flag. Fence_tool did startup fenced on fouroften. Ccsd started up and is running on both nodes. According to /proc/cluster/status and nodes on both fouroften and fiveoften, the cluster is up and has quorum. fence_tool printed these messages on node fiveoften: + fence_tool join -D fence_tool: cannot connect to ccs -111 fence_tool: wait for quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: waiting for cluster quorum fence_tool: get our node name fence_tool: connect to ccs fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 Log entries on node fiveoften: Dec 10 14:08:39 fiveoften kernel: Lock_Harness <CVS> (built Dec 10 2004 09:14:45) installed Dec 10 14:08:39 fiveoften kernel: GFS <CVS> (built Dec 10 2004 09:14:04) installed Dec 10 14:08:39 fiveoften kernel: CMAN <CVS> (built Dec 10 2004 09:51:59) installed Dec 10 14:08:39 fiveoften kernel: NET: Registered protocol family 30 Dec 10 14:08:39 fiveoften kernel: DLM <CVS> (built Dec 10 2004 09:52:25) installed Dec 10 14:08:39 fiveoften kernel: Lock_DLM (built Dec 10 2004 09:14:25) installed Dec 10 14:08:40 fiveoften kernel: CMAN: Waiting to join or form a Linux-cluster Dec 10 14:09:11 fiveoften kernel: CMAN: sending membership request Dec 10 14:09:11 fiveoften kernel: CMAN: got node fouroften Dec 10 14:09:11 fiveoften kernel: CMAN: quorum regained, resuming activity Dec 10 14:09:11 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:11 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:12 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:12 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:13 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:13 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:14 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:14 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:15 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:15 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:16 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:16 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:17 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:17 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:18 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:18 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:19 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:19 fiveoften ccsd[3391]: Error while processing connect: Connection refused Dec 10 14:09:20 fiveoften ccsd[3391]: Cluster is not quorate. Refusing connection. Dec 10 14:09:20 fiveoften ccsd[3391]: Error while processing connect: Connection refused The logs stopped when fence_tool exited. On node fiveoften, /proc/cluster/status and /proc/cluster/nodes contain: [mbrookov@fiveoften ~]$ more /proc/cluster/status Protocol version: 4.0.1 Config version: 6 Cluster name: CSMTEST Cluster ID: 9374 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 0 Node addresses: 138.67.4.25 [mbrookov@fiveoften ~]$ more /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M fouroften 2 1 1 M fiveoften [mbrookov@fiveoften ~]$ On node fouroften, /proc/cluster/status and /proc/cluster/nodes contain: [mbrookov@fouroften ~]$ more /proc/cluster/status Protocol version: 4.0.1 Config version: 6 Cluster name: CSMTEST Cluster ID: 9374 Membership state: Cluster-Member Nodes: 2 Expected_votes: 1 Total_votes: 2 Quorum: 1 Active subsystems: 1 Node addresses: 138.67.4.24 [mbrookov@fouroften ~]$ more /proc/cluster/nodes Node Votes Exp Sts Name 1 1 1 M fouroften 2 1 1 M fiveoften Log entries on node fouroften: Dec 10 14:08:36 fouroften kernel: Lock_Harness <CVS> (built Dec 10 2004 09:14:45) installed Dec 10 14:08:36 fouroften kernel: GFS <CVS> (built Dec 10 2004 09:14:04) installed Dec 10 14:08:36 fouroften kernel: CMAN <CVS> (built Dec 10 2004 09:51:59) installed Dec 10 14:08:36 fouroften kernel: NET: Registered protocol family 30 Dec 10 14:08:36 fouroften kernel: DLM <CVS> (built Dec 10 2004 09:52:25) installed Dec 10 14:08:36 fouroften kernel: Lock_DLM (built Dec 10 2004 09:14:25) installed Dec 10 14:08:37 fouroften kernel: CMAN: Waiting to join or form a Linux-cluster Dec 10 14:09:09 fouroften kernel: CMAN: forming a new cluster Dec 10 14:09:09 fouroften kernel: CMAN: quorum regained, resuming activity Dec 10 14:09:09 fouroften kernel: CMAN: got node fiveoften /etc/cluster/cluster.conf: <?xml version="1.0"?> <cluster name="CSMTEST" config_version="6"> <cman two_node="1" expected_votes="1"> </cman> <clusternodes> <clusternode name="fouroften" votes="1"> <fence> <method name="cascade1"> <device name="human" ipaddr="fouroften"/> </method> </fence> </clusternode> <clusternode name="fiveoften" votes="1"> <fence> <method name="cascade1"> <device name="human" ipaddr="fiveoften"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="human" agent="fence_manual"/> </fencedevices> </cluster> Both nodes are running Fedora Core 3 with the 2.6.9 kernel from kernel.org. Thanks for you time! Matt mbrookov@xxxxxxxxx