On Thu, 2009-08-13 at 00:45 +0200, brem belguebli wrote: > My understanding of qdisk is that it is used as a tie-breaker, but it > looks like it is more a heatbeat vector than a simple tie-breaker. Right, it's a secondary membership algorithm. > Until here, no real problem indeed, if the site gets apart from the > other prod site and also from the third site (hosting the iscsi target > qdisk) the 2 nodes from the failing site get evicted from the cluster. > > > But, what if my third site gets isolated while the 2 prod ones are > fine ? Qdisk votes will not be presented to CMAN any more, but the two sites should remain online if they still have a "majority" of votes. > The real question is what happens in case all the nodes loose access > to the qdisk while they're still able to see each others ? Qdisk is just a vote like other voting mechanisms. If all nodes lose access at the same time, it should behave like a node death. However, the default action if _one_ node loses access is to kill that node (even if CMAN still sees it). > The 4 nodes have each 1 vote and the qdisk 1 vote. The expected quorum > is 3. > If I loose the qdisk, the number of votes falls to 4, the cluster is > quorate (4>3) but it looks like everything goes bad, each node > deactivate itself as it can't write its alive status (--> heartbeat > vector) to the qdisk even if the network heartbeating is working > fine. What happens specifically? Most of the actions qdiskd performs are configurable. For example, if the nodes are rebooting, you can turn that behavior off. I wrote a simple 'ping' tiebreaker based the behaviors in RHEL3. It functions in many ways in the same manner as qdiskd with respect to vote advertisement to CMAN, but without needing a disk - maybe you would find it useful? http://people.redhat.com/lhh/qnet.tar.gz -- Lon -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster