On 1/26/2012 1:15 PM, Vladislav Bogdanov wrote: >>>> Probably even not lower than number of votes from nodes which are now >>>> either active or inactive but joined at least once (I suppose that >>>> nodelist is fully editable at runtime, so admin may some-how reset join >>>> count of node and only than reduce expected_votes). >> >> I have been thinking about this some more, but I am not sure I grasp the >> use case or what kind of protection you try to suggest. >> >> Reducing the number of expected_votes is an admin action, it´s not very >> different from removing a node from the "seen" list manually and >> recalculating expected_votes. >> >> Can you clarify it for me? > > Imagine (this case is a little bit hypothetical, but anyways): > * You have cluster with 8 active nodes, and you (for some historical > reasons or due to admin fault/laziness) have expected_votes set to 3 > (ok, you had 3-node cluster not so long ago, but added more nodes > because of growing load). > * Cluster splits 5+3 due to loss of communication between switches (or > switch-stacks). > * 3 nodes are fenced. > * Partition with majority continues operation. > * 3 fenced nodes boot back, and form *quorate* partition because they > have expected_votes set to 3 > * Data is corrupted > > If fenced nodes know right after boot that cluster consists of 8 active > nodes, they would not override expected_votes obtained from the > persistent "seen" list with the lower value from the config, and the > data is safe. Oh great.. yes I see where you are going here. It sounds an interesting approach but that clearly requires a file where to store those information. There is still a window where the file containing the expected_votes from "seen" list is corrupted tho. At that point it´s difficult to detect which of the two information is correct and it doesn´t prevent the issue at all if the file is removed entirely (even by accident), but at a first shot i would say that it is better than nothing. I´ll have a test and see how it pans out but at a first glance I think we should only store the last known expected_votes while quorate. The node booting would use the higher of the two values. If the cluster has decreased in size in the meantime, the node joining would be informed about it (just sent a patch to the list about it 10 minutes ago ;)) > > Is it possible to prevent such behavior currently, assuming that you > have dynamically-growing cluster (nodes are powered-on on demand)? (I've > seen your amazing documentation commits, but still did not dive deep > enough.) Not this specific use case, no. Fabio _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss