On Wed, Jan 11, 2012 at 6:49 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote: > On 1/11/2012 7:41 AM, Andrew Beekhof wrote: >> On Wed, Jan 11, 2012 at 4:50 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote: >>> On 01/10/2012 11:47 PM, Andrew Beekhof wrote: >>>> On Tue, Jan 10, 2012 at 9:08 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote: >>>>> Hi all, >>>>> >>>>> in some recent discussions, it come up the issue on how to configure >>>>> quorum module. As I don´t really have a complete solution yet, I need to >>>>> seek advice in the community :) >>>>> >>>>> Problem: >>>>> >>>>> it would be very nice if corosync.conf could be simply scp´ed/copied >>>>> between nodes and everything works as expected on all nodes. >>>>> Issue being that some quorum bits are, at this point in time, node >>>>> specific. It means that to alter some values, it is necessary to edit >>>>> corosync.conf on the specific node. >>>>> On top of that, it would be nice if expected_votes could be >>>>> automatically calculated based on votes: values. >>>>> >>>>> The current quorum configuration (based on topic-quorum patches): >>>>> >>>>> quorum { >>>>> provider: corosync_votequorum >>>>> expected_votes: 8 >>>>> votes: 1 >>>>> two_node: 0 >>>>> wait_for_all: 0 >>>>> last_man_standing: 0 >>>>> auto_tie_breaker: 0 >>>>> } >>>>> >>>>> totem { >>>>> nodeid: xxx >>>>> } >>>>> >>>>> The 2 values that cannot be copied around are quorum.votes and totem.nodeid. >>>>> >>>>> In current votequorum/totem incarnation, votes/expected_votes/nodeid are >>>>> all broadcasted to all nodes. so each node that joins the cluster >>>>> becomes aware of the other peers values. >>>>> >>>>> As a consequence of the current config format, auto_tie_breaker feature, >>>>> requires wait_for_all to work (in order to have the complete list of >>>>> nodeids, see auto_tie_breaker implementation in topic-quorum branch for >>>>> details). >>>>> >>>>> Honza and I quickly explored options to add those values into the node >>>>> list of udpu, but that´s limiting because it doesn´t work well in >>>>> multicast and/or broadcast and it has integration issues with RRP. >>>>> >>>>> Also adding lists to quorum {} involves a certain level of duplicated >>>>> information. >>>>> >>>>> For example: >>>>> >>>>> quorum { >>>>> nodeid_list: x y z... >>>>> node.x.votes: .. >>>>> node.y.votes: .. >>>>> } >>>>> >>>>> that IMHO is all but nice to look at. >>>>> >>>>> So the question of changing the config format also raise the following >>>>> questions: >>>>> >>>>> 1) do we really need to support an auto_tie_breaker feature without >>>>> wait_for_all? if NO, then we don´t need the list of nodeids upfront. >>>>> >>>>> 2) do we really care about votes other than 1? >>>> >>>> That was also my question when reading the above. >>>> It always struck me as troublesome to get right, just giving one of 4 >>>> nodes an extra vote (for example) will still give you a tie under the >>>> wrong conditions. >>>> >>>> Seems (to me) like a habit people got into when clusters went to >>>> pieces without quorum and that we have "better" solutions today (like >>>> the token registry). >>>> So my vote is drop it. >>> >>> That was my take too in the beginning but apparently there are some use >>> cases that require votes != 1. >> >> Can someone enumerate a couple? Maybe they're valid, maybe they're not. > > Lon/David need to pich in here. Lon gave me an example with some magic > numbers but I keep forgetting to write it down. > >> >>>>> If NO, then votes: can >>>>> simply be dropped from corosync.conf defaults, and in case an override >>>>> is necessary, it can be done specific to the node. This solution poses >>>>> the problem that expected_votes need to be set in corosync.conf (one >>>>> liner in the config file vs different liners) but it might be slightly >>>>> more tricky to calculate if votes are not balanced. >>>> >>>> Any chance the value could be incremented based on the number of nodes >>>> ever seen? >>>> Ie. if count(active peers) > expected votes, update the config file. >>> >>> expected_votes is already calculated that way. If you configure 8 but >>> all of a sudden you see 9 nodes, then expected_votes is incremented. >>> The above is true also if one node starts voting differently (1 -> X) >>> then expected_votes is updated across the cluster automagically. >>> Writing to file is unnecessary operation with votequorum current >>> incarnation. >> >> I'm not sure about that. >> If it was 3 and gets runtime bumped to 5, then two of the original 3 >> could come back up thinking they have quorum (at the same time the >> remaining 3 legitimately retain quorum). >> >> Or am I missing something? > > I would expects admins to update corosync.conf as node counts increase, > but the automatic increase is there as fail safe. > > At the same time, when a node joins a running cluster, even if it has > expected votes set to 1, it would receive the highest expected vote in > the cluster from the other nodes. > > Yes, it doesn´t protect against stupid user errors that will not > increase the expected votes and that partition case. That would make > "write to config file" a good thing, but I doubt corosync has that > option right now. > >> >>> >>> >>>> >>>> That way most people could simply ignore the setting until they wanted >>>> to remove a node. >>> >>> Not that simple no. >>> >>> There are several cases where expected_votes is required to be known >>> upfront specially when handling partitions and startups. >>> >>> Let say you have 8 nodes cluster. quorum expected to be 5. >> >> Err. Why would you ever do that? And wouldn't the above logic bump it >> to 8 at runtime? > > Uh? 8 / 2 + 1 = 5 For those playing along at home, I thought Fabio was saying expected_votes=5 not that quorum was reached at 5. > > If I expect 8 nodes, 1 vote each, quorum is 5. expected_votes != quorum. > > expected votes is the highest number of votes in a cluster. > >> >>> Switch between 4 nodes and 4 nodes is dead or mulfunctioning. By using >>> an incremental expected_votes, you can effectively start 2 clusters. >> >> You can, but you'd probably stop after the 5th node didn't join the first four. >> Because if you're writing the highest value back to corosync.conf then >> the only time you could hit this situation is on first cluster boot > > Right assuming you write that value back to corosync.conf, I agree, but > that also implies that you have seen all cluster nodes up at once at > least one time. > > At the end, I think it´s a lot safer to just know expected_votes upfront > and a lot less complicated for the user to bring the cluster up. > >> (and you don't bring up all members of a brand new cluster all at >> once). > > ehhh we can´t assume that. customers do that and we have seen bugs > related to this condition. Again for those at home, I was talking about a cluster that had just been installed and had not been previously started. Ever. Fabio was talking about a cluster that was fully stopped but had been started at some point in the past. > >> >>> Both clusters would be quorate, with expected_votes set to 4 and quorum >>> to 3. No guarantee those will merge. I doubt we want this situation to >>> ever exists. >>> >>> also, it would break the wait_for_all feature (or WFA would need to >>> require expected_votes .. either way). >> >> Again, it only affects the first time you bring up the cluster. >> After that, expected_votes would have been (auto) set correctly and >> wait_for_all would work as expected. >> > > wait_for_all is only useful when you bring the cluster up for the very > first time... the two options conflict. Same as before. I was talking about a cluster that had just been installed and had not been previously started, Fabio was talking about a cluster that was fully stopped. _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss