On 1/30/2012 1:51 PM, Andrew Beekhof wrote: > On Mon, Jan 30, 2012 at 11:31 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote: >> On 1/27/2012 10:46 PM, Vladislav Bogdanov wrote: >>> 26.01.2012 15:41, Fabio M. Di Nitto wrote: >>>> On 1/26/2012 1:15 PM, Vladislav Bogdanov wrote: >>>> >>>>>>>> Probably even not lower than number of votes from nodes which are now >>>>>>>> either active or inactive but joined at least once (I suppose that >>>>>>>> nodelist is fully editable at runtime, so admin may some-how reset join >>>>>>>> count of node and only than reduce expected_votes). >>>>>> >>>>>> I have been thinking about this some more, but I am not sure I grasp the >>>>>> use case or what kind of protection you try to suggest. >>>>>> >>>>>> Reducing the number of expected_votes is an admin action, it´s not very >>>>>> different from removing a node from the "seen" list manually and >>>>>> recalculating expected_votes. >>>>>> >>>>>> Can you clarify it for me? >>>>> >>>>> Imagine (this case is a little bit hypothetical, but anyways): >>>>> * You have cluster with 8 active nodes, and you (for some historical >>>>> reasons or due to admin fault/laziness) have expected_votes set to 3 >>>>> (ok, you had 3-node cluster not so long ago, but added more nodes >>>>> because of growing load). >>>>> * Cluster splits 5+3 due to loss of communication between switches (or >>>>> switch-stacks). >>>>> * 3 nodes are fenced. >>>>> * Partition with majority continues operation. >>>>> * 3 fenced nodes boot back, and form *quorate* partition because they >>>>> have expected_votes set to 3 >>>>> * Data is corrupted >>>>> >>>>> If fenced nodes know right after boot that cluster consists of 8 active >>>>> nodes, they would not override expected_votes obtained from the >>>>> persistent "seen" list with the lower value from the config, and the >>>>> data is safe. >>>> >>>> Oh great.. yes I see where you are going here. It sounds an interesting >>>> approach but that clearly requires a file where to store those information. >>> >>> I do not see a big problem here... >>> Corosync saves its ring persistently anyways. >>> >>>> >>>> There is still a window where the file containing the expected_votes >>>> from "seen" list is corrupted tho. At that point it´s difficult to >>>> detect which of the two information is correct and it doesn´t prevent >>>> the issue at all if the file is removed entirely (even by accident), but >>>> at a first shot i would say that it is better than nothing. >>> >>> Hopefully at least not all nodes from a fenced partition will have it >>> corrupted/deleted. They should honor the maximal ev value from them all. >> >> Right, I am just a bit conservative and maybe I apply extreme caution :) >> >>> >>>> >>>> I´ll have a test and see how it pans out but at a first glance I think >>>> we should only store the last known expected_votes while quorate. >>>> The node booting would use the higher of the two values. If the cluster >>>> has decreased in size in the meantime, the node joining would be >>>> informed about it (just sent a patch to the list about it 10 minutes ago ;)) >>> >>> I'd argue that you do not know who is the last known (or ever known) >>> active then. >>> >>> Dynamically handled persistent list is much better from this point of >>> view. At it resembles what pacemaker does right now. This is probably >>> the major value for me. >> >> Ok hold on a sec here, i think there is a basic misunderstanding :)... >> you won´t be forced to use votequorum. And votequorum only provides >> simple majority quorum with some extra feature. >> >> Dynamic quorum is not part of it. votequorum has some features that >> allows you to upscale (dynamically) or downscale (manually) the cluster. >> >> You can decide to opt out from using votequorum and retain current >> pacemaker behavior as is now so in fact, there would be no regression at >> all for you. > > As per irc, this isn't an option. > The part of pacemaker that did this was loaded inside corosync as a > plugin, which isn't allowed anymore. > Yes, thanks for the clarification. I still believe that linear/dynamic is part of ykd module and out of scope for votequorum (simple majority). Fabio _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss