I've upgraded a test cluster to RHEL4U5, but I'm having some problems on boot. ccsd and cman seem to be having a small disagreement over whether the cluster is quorate meaning fenced fails to start up. First, here is a normal boot sequence from another cluster running 4U4: May 31 10:19:31 node04 ccsd[2503]: Starting ccsd 1.0.7: May 31 10:19:31 node04 ccsd[2503]: Built: Aug 25 2006 15:00:06 May 31 10:19:31 node04 ccsd[2503]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. May 31 10:19:31 node04 ccsd: succeeded May 31 10:19:31 node04 kernel: CMAN 2.6.9-45.15 (built Mar 27 2007 22:56:03) installed May 31 10:19:31 node04 kernel: NET: Registered protocol family 30 May 31 10:19:31 node04 kernel: DLM 2.6.9-44.9 (built Mar 27 2007 23:00:18) installed May 31 10:19:31 node04 ccsd[2503]: cluster.conf (cluster name = cluster1, version = 2) found. May 31 10:19:32 node04 kernel: CMAN: Waiting to join or form a Linux-cluster May 31 10:19:32 node04 last message repeated 3 times May 31 10:19:32 node04 ccsd[2503]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.7.1 May 31 10:19:32 node04 ccsd[2503]: Initial status:: Inquorate May 31 10:19:34 node04 kernel: CMAN: sending membership request May 31 10:19:34 node04 last message repeated 3 times May 31 10:19:34 node04 last message repeated 3 times May 31 10:19:34 node04 last message repeated 5 times May 31 10:19:34 node04 last message repeated 5 times May 31 10:19:35 node04 kernel: CMAN: got node node05 May 31 10:19:35 node04 kernel: CMAN: got node node06 May 31 10:19:35 node04 kernel: CMAN: got node node03 May 31 10:19:35 node04 kernel: CMAN: got node node01 May 31 10:19:35 node04 kernel: CMAN: got node node02 May 31 10:19:35 node04 ccsd[2503]: Cluster is quorate. Allowing connections. May 31 10:19:35 node04 kernel: CMAN: quorum regained, resuming activity May 31 10:19:35 node04 cman: startup succeeded May 31 10:19:38 node04 defuturo: fenced succeeded Now, some logs for the 4U5 cluster: Jun 8 12:40:26 tamarillo ccsd[2448]: Starting ccsd 1.0.10: Jun 8 12:40:26 tamarillo ccsd[2448]: Built: May 31 2007 15:48:09 Jun 8 12:40:26 tamarillo ccsd[2448]: Copyright (C) Red Hat, Inc. 2004 All rights reserved. Jun 8 12:40:26 tamarillo ccsd: succeeded Jun 8 12:40:26 tamarillo kernel: CMAN 2.6.9-50.2 (built May 31 2007 15:39:24) installed Jun 8 12:40:26 tamarillo kernel: NET: Registered protocol family 30 Jun 8 12:40:26 tamarillo kernel: DLM 2.6.9-46.16 (built May 31 2007 15:45:51) installed Jun 8 12:40:26 tamarillo ccsd[2448]: cluster.conf (cluster name = defuturo_test, version = 2) found. Jun 8 12:40:27 tamarillo kernel: CMAN: Waiting to join or form a Linux-cluster Jun 8 12:40:27 tamarillo kernel: CMAN: sending membership request Jun 8 12:40:28 tamarillo kernel: CMAN: got node guava Jun 8 12:40:28 tamarillo kernel: CMAN: got node kiwano Jun 8 12:40:28 tamarillo kernel: CMAN: quorum regained, resuming activity Jun 8 12:40:28 tamarillo cman: startup succeeded Jun 8 12:40:30 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:30 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:31 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:31 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:32 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:32 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:33 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:33 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:34 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:34 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:35 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:35 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:36 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:36 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:37 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:37 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:38 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:38 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:39 tamarillo ccsd[2448]: Cluster is not quorate. Refusing connection. Jun 8 12:40:39 tamarillo ccsd[2448]: Error while processing connect: Connection refused Jun 8 12:40:40 tamarillo defuturo: fenced failed Jun 8 12:40:40 tamarillo ccsd[2448]: Connected to cluster infrastruture via: CMAN/SM Plugin v1.1.7.4 Jun 8 12:40:40 tamarillo ccsd[2448]: Initial status:: Quorate This error happens on most boots, but not all, so I suspect a race condition. By the time I can log into the node, it's quorate and I can start up fenced manually. I've put in some debugging and verified that /proc/cluster/status lists the cluster as quorate immediately before and after attempting to start fenced. There are a couple of things of note about our set-up: 1) We're not using fence from 4U5 because of bz241217, so fence-1.32.25-1 is installed on both clusters. 2) fenced is being started in a chroot jail by our own script which runs: /usr/sbin/chroot /mnt/fenced /sbin/fence_tool -t 0 join -w The output from that command is: fence_tool: cannot connect to ccs -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 fence_tool: waiting for ccs connection -111 3) /etc/sysconfig/cluster sets CMAN_CLUSTER_TIMEOUT=0 and CMAN_QUORUM_TIMEOUT=86400. Does anyone know what might cause ccsd to continue to refuse connections for a lack of quorum after cman has decided the cluster is quorate? Robert -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster