Re: 4 node cluster even split

Janne Peltonen <janne.peltonen@xxxxxxxxxxx> · Tue, 10 Apr 2007 15:20:01 +0300

On Tue, Apr 10, 2007 at 12:43:43PM +0100, Patrick Caulfield wrote:
> Under normal circumstances you need (n/2)+1 nodes to keep quorum. So if you lose
> two nodes out of four then the services will stop. To prevent this you can use
> the qdisk program to keep the cluster quorate in such circumstances.

OK, that's what I thought... this FAQ entry just confused me:

 http://sources.redhat.com/cluster/faq.html#cman_oddnodes

. But I think you mean floor((n/2) + 1) (to account for clusters with an
odd number of nodes, too) ;)

The planned setup is as follows:

 -two blade racks
 -four blade servers to serve as cluster nodes, two in each rack

We'll have two racks anyway, for other projects, so I thought splitting
the cluster nodes evenly among the racks would give us some redundancy
in case we lose one of the racks. But if we lose one of the racks,
this'll cause the cluster to lose quorum...

One solution would be to have one non-blade server in the cluster as
well, so if we lose either rack, we'll still have 3 (=floor((5/2) + 1))
nodes available, and keep quorum. But this would mean that we'd have to
add nodes in pairs (otherwise, we could still have a rack-ful of fully
operable nodes that would refuse to keep quorum - unless, of course, the
node outside the racks has enough votes - but that would create trouble
if we lost that one node...).

Now I wonder. Would it be simpler to set up a qdisk? That'd require
some consideration of the heuristics to use. And the documentation
appears sparse. Is this correct:

 -whichever partition of a partitioned cluster contains the qdisk
 master, has all the qdisk votes,

 AND

 -qdisk master will always be in the partition that contains the
 lowest-numbered cluster node that considers itself feasible to be in
 the cluster (regardless of the opinion of the other nodes in that
 partition)?

In my scenario (failing rack), the qdisk master and the qdisk votes
would find their way to alive partition and the partition would keep
quorum. I think.

If I were to use a simple heuristic (such as pinging a router upstream),
that wouldn't break anything in the failing-rack scenario. But what if
there was a real split in the heartbeat network, with both halves of the
cluster still seeing the upstream router (because there was no split in
the 'outer' network)? If SAN access is kept, and both halves still see
the quorum device, would the cluster halves be able to agree on the
master?

--Janne
-- 
Janne Peltonen <janne.peltonen@xxxxxxxxxxx>

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster