> > Oct 13 04:17:18 ey00-s00017 kernel: CMAN: got WAIT barrier not in phase > > 1 TRANSITION.96 (2) > That message should be harmless. does it prevent the cluster reaching quorum ? Hello Patrick / list, I've been working with Tom on this problem. It doesn't prevent quorum, although after this point the nodes mysteriously can't seem to join the fence domain. I've checked and it doesn't appear that anyone is trying to fence anyone else, so I'm at a bit of a loss to explain what's going on. The really bizarre thing is that the old nodes don't seem to play with the new ones despite them being joined into the cluster (i.e. fence domain on old nodes shows running, fence domain on new node says joining indefinitely). If you prod it enough (start enough new nodes), eventually the existing cluster will blow apart (nodes start kicking each other for inconsistency and the like). Let me explain a few things about our cluster: We are running Xen. The control VM for each node is in the cluster with 1 vote. The application VMs are dynamically spawned and are entered into the cluster. The application VMs have 0 votes (so as to prevent one physical machine from accidentally grabbing a quorum of votes if it has too many application VMs running on it). We are currently using fence_manual for debugging purposes (we have an APC MasterSwitch to eventually use for fencing). We are experiencing the following problems: After a certain size (about 20 cluster members) we start having serious issues with the cluster holding together. Nodes are sometimes kicked for having an inconsistent view. There is often a complaint about the count of members not matching between nodes as well. Right now we have the 1.03 version of everything installed (it was packaged and we are trying to avoid building too much from scratch). When a node starts up with an old cluster.conf, it never seems to automatically update to the newer version. If the file is updated while a node is down, must it be manually synched up before resuming? Finally, a random question. When I'm debugging this stuff, I use "cman_tool services" to keep tabs on some things. What does the stuff in the Code column mean? -- Jayson Vantuyl Quality Humans, Inc. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster