On Tue, Apr 10, 2007 at 12:43:43PM +0100, Patrick Caulfield wrote: > Under normal circumstances you need (n/2)+1 nodes to keep quorum. So if you lose > two nodes out of four then the services will stop. To prevent this you can use > the qdisk program to keep the cluster quorate in such circumstances. OK, that's what I thought... this FAQ entry just confused me: http://sources.redhat.com/cluster/faq.html#cman_oddnodes . But I think you mean floor((n/2) + 1) (to account for clusters with an odd number of nodes, too) ;) The planned setup is as follows: -two blade racks -four blade servers to serve as cluster nodes, two in each rack We'll have two racks anyway, for other projects, so I thought splitting the cluster nodes evenly among the racks would give us some redundancy in case we lose one of the racks. But if we lose one of the racks, this'll cause the cluster to lose quorum... One solution would be to have one non-blade server in the cluster as well, so if we lose either rack, we'll still have 3 (=floor((5/2) + 1)) nodes available, and keep quorum. But this would mean that we'd have to add nodes in pairs (otherwise, we could still have a rack-ful of fully operable nodes that would refuse to keep quorum - unless, of course, the node outside the racks has enough votes - but that would create trouble if we lost that one node...). Now I wonder. Would it be simpler to set up a qdisk? That'd require some consideration of the heuristics to use. And the documentation appears sparse. Is this correct: -whichever partition of a partitioned cluster contains the qdisk master, has all the qdisk votes, AND -qdisk master will always be in the partition that contains the lowest-numbered cluster node that considers itself feasible to be in the cluster (regardless of the opinion of the other nodes in that partition)? In my scenario (failing rack), the qdisk master and the qdisk votes would find their way to alive partition and the partition would keep quorum. I think. If I were to use a simple heuristic (such as pinging a router upstream), that wouldn't break anything in the failing-rack scenario. But what if there was a real split in the heartbeat network, with both halves of the cluster still seeing the upstream router (because there was no split in the 'outer' network)? If SAN access is kept, and both halves still see the quorum device, would the cluster halves be able to agree on the master? --Janne -- Janne Peltonen <janne.peltonen@xxxxxxxxxxx> -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster