I have a question about the qdisk concept.
My understanding of qdisk is that it is used as a tie-breaker, but it looks like it is more a heatbeat vector than a simple tie-breaker.
My setup consists of 4 nodes located on 2 different production sites (2+2) using SAN shared storage (2 disk frames, 1 per site).
The qdisk is a iscsi shared lun from a third site that I expected to use as a tie-breaker in case 1 of my 2 prod sites was experiencing network problems and gets completely isolated.
Until here, no real problem indeed, if the site gets apart from the other prod site and also from the third site (hosting the iscsi target qdisk) the 2 nodes from the failing site get evicted from the cluster.
But, what if my third site gets isolated while the 2 prod ones are fine ?
The real question is what happens in case all the nodes loose access to the qdisk while they're still able to see each others ?
The 4 nodes have each 1 vote and the qdisk 1 vote. The expected quorum is 3.
When the cluster is running with all of its nodes and the qdisk, the number of votes is 5.
If I loose the qdisk, the number of votes falls to 4, the cluster is quorate (4>3) but it looks like everything goes bad, each node deactivate itself as it can't write its alive status (--> heartbeat vector) to the qdisk even if the network heartbeating is working fine.
I have tried to configure heuristics (ping a node on the third site) without qdisk device but they seem to be ignored.
Any comments or tips ?
Regards
PS: added the flag :-)
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster