I have seen this behavior. It is not strange to me. This is only strange to people who do not understand how quorum systems work.
The problem is that you have a two node cluster. If you had three nodes, this would not be an issue. In a two-node cluster, the two nodes are both capable of fencing each other even though they no longer have quorum. There is mathematically no other way to have a majority of 2 nodes without both of them. The Quorum Disk allows the running nodes to use a heuristic--like the ethernet link check you speak of (or a ping to the network gateway which would also be helpful). This heuristic allows you to artificially reach quorum by giving extra votes to the node that can still determine that it is okay. The moment that a node fails for any reason other than an ethernet disconnection your workaround falls apart. If some "Central Bank" is truly your customer, then you should be able to obtain a third node with no problems. Otherwise, the Quorum Disk provides better behavior than your "workaround" by actually solving the problem in a generally applicable and sophisticated way. This is a configuration problem. If you desire not to be laughed at learn how to configure your software. Also, for what its worth, I don't use bonding on my machines due to the switches I utilize (I use bridging instead), but I would recommend keeping this for reliability of the ethernet, as it is an important failure case. -- Jayson Vantuyl Systems Architect |
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster