Tried replacing the switch with a crossover cable. The problem goes away. It looks like there is some odd delay in the switch. The NIC is configured, but it takes 4 seconds for the link to go up. Huh. We have a dedicated network for all the cluster traffic. Nothing else uses it. In the two-node case, we use a cable. In larger clusters we will use a switch. First delivery is for two-node clusters. But, I worry about that slow switch. Regards. "If there are no dogs in Heaven, then when I die I want to go where they went." From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Vallevand, Mark K It looks like there is some odd delay in getting a network interface up and ready. So, when cman starts corosync, it can’t get to the cluster. So, for a time, the node is a member of a cluster-of-one. The cluster-of-one begins starting resources. A few seconds later, when the interface finally is up and ready, it takes about 30 more seconds for the cluster-of-one to finally rejoin the larger cluster. The doubly-started resources are sorted out and all ends up OK. Now, this is not a good thing to have these particular resources running twice. I’d really like the clustering software to behave better. But, I’m not sure what ‘behave better’ would be. Is it possible to introduce a delay into cman or corosync startup? Is that even wise? Is there a parameter to get the clustering software to poll more often when it can’t rejoin the cluster? Any suggestions would be welcome. Running Ubuntu 12.04 LTS. Pacemaker 1.1.6. Cman 3.1.7. Corosync 1.4.2. Regards. "If there are no dogs in Heaven, then when I die I want to go where they went." |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster