Re: Why no node-specific <quorumd>?

Lon Hohberger <lhh@xxxxxxxxxx> · Thu, 29 Mar 2007 14:34:34 -0400

On Mon, Mar 26, 2007 at 11:25:39PM +0200, Jos Vos wrote:
> On Mon, Mar 26, 2007 at 02:00:04PM -0700, Steven Dake wrote:
> 
> > I'm not sure why you would want to select one view of a primary
> > component (aka quorum membership) with one node, and another primary
> > component with another node.  Then as the nodes make decisions, they
> > would be inconsistent (ie: consider GFS one node would think writes were
> > ok, while another may not...  shouldn't they agree?).
> 
> Maybe one step back to the CMAN algorithm:
> 
> I have been looking for a comprehensive summary of the algorithm on
> how CMAN determines node failures (heartbeat *AND* quorum disk should
> show the node being "up"?) and how votes are calculated.  But I could
> not find it.
> 
> I'm afraid it is not clear what the role the <quorumd> votes play:
> are they only calculated on each node or are the votes stored on the
> quorum disk and read by other nodes?

CMAN's quorum device votes are calculated on each node.  However, when
tallied for a total quorum count, they are only counted once.

So, if we have N instances of qdiskd running advertising 1 vote to its
parent CMAN = 1 vote, cluster-wide - even though every node is keeping
track of the quorum device.

Ex: If you have a 4 node + 4-vote qdisk setup:

   When all nodes are online:
      node1: I see 3 others + qdisk = 8
      node2: I see 3 others + qdisk = 8
      node3: I see 3 others + qdisk = 8
      node4: I see 3 others + qdisk = 8

  In a 3:1 split, with properly configured heuristics (where the 1
  node must continue operations):
      node1: I see 0 others + qdisk = 5
      node2: I see 2 others + NO qdisk = 3
      node3: I see 2 others + NO qdisk = 3
      node4: I see 2 others + NO qdisk = 3

  = node1 wins

  In a 3:1 split, with properly configured heuristics (where the 1
  node was partitioned off):
      node1: I see 1 others + NO qdisk = 2
      node2: I see 3 others + qdisk = 6
      node3: I see 3 others + qdisk = 6
      node4: I see 3 others + qdisk = 6

  = nodes 2/3/4 win

When the heuristics fail, qdiskd tells CMAN that qdisk votes are no
longer available, and it advertises over the disk that it's no longer
fit for participation in the cluster.

Normal fencing rules apply (winner partition must fence the dead
partition in a network partition)

Hope this helps.

-- Lon

-- 
Lon Hohberger - Software Engineer - Red Hat, Inc.

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster