On 1/11/2012 7:41 AM, Andrew Beekhof wrote: > On Wed, Jan 11, 2012 at 4:50 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote: >> On 01/10/2012 11:47 PM, Andrew Beekhof wrote: >>> On Tue, Jan 10, 2012 at 9:08 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote: >>>> Hi all, >>>> >>>> in some recent discussions, it come up the issue on how to configure >>>> quorum module. As I don´t really have a complete solution yet, I need to >>>> seek advice in the community :) >>>> >>>> Problem: >>>> >>>> it would be very nice if corosync.conf could be simply scp´ed/copied >>>> between nodes and everything works as expected on all nodes. >>>> Issue being that some quorum bits are, at this point in time, node >>>> specific. It means that to alter some values, it is necessary to edit >>>> corosync.conf on the specific node. >>>> On top of that, it would be nice if expected_votes could be >>>> automatically calculated based on votes: values. >>>> >>>> The current quorum configuration (based on topic-quorum patches): >>>> >>>> quorum { >>>> provider: corosync_votequorum >>>> expected_votes: 8 >>>> votes: 1 >>>> two_node: 0 >>>> wait_for_all: 0 >>>> last_man_standing: 0 >>>> auto_tie_breaker: 0 >>>> } >>>> >>>> totem { >>>> nodeid: xxx >>>> } >>>> >>>> The 2 values that cannot be copied around are quorum.votes and totem.nodeid. >>>> >>>> In current votequorum/totem incarnation, votes/expected_votes/nodeid are >>>> all broadcasted to all nodes. so each node that joins the cluster >>>> becomes aware of the other peers values. >>>> >>>> As a consequence of the current config format, auto_tie_breaker feature, >>>> requires wait_for_all to work (in order to have the complete list of >>>> nodeids, see auto_tie_breaker implementation in topic-quorum branch for >>>> details). >>>> >>>> Honza and I quickly explored options to add those values into the node >>>> list of udpu, but that´s limiting because it doesn´t work well in >>>> multicast and/or broadcast and it has integration issues with RRP. >>>> >>>> Also adding lists to quorum {} involves a certain level of duplicated >>>> information. >>>> >>>> For example: >>>> >>>> quorum { >>>> nodeid_list: x y z... >>>> node.x.votes: .. >>>> node.y.votes: .. >>>> } >>>> >>>> that IMHO is all but nice to look at. >>>> >>>> So the question of changing the config format also raise the following >>>> questions: >>>> >>>> 1) do we really need to support an auto_tie_breaker feature without >>>> wait_for_all? if NO, then we don´t need the list of nodeids upfront. >>>> >>>> 2) do we really care about votes other than 1? >>> >>> That was also my question when reading the above. >>> It always struck me as troublesome to get right, just giving one of 4 >>> nodes an extra vote (for example) will still give you a tie under the >>> wrong conditions. >>> >>> Seems (to me) like a habit people got into when clusters went to >>> pieces without quorum and that we have "better" solutions today (like >>> the token registry). >>> So my vote is drop it. >> >> That was my take too in the beginning but apparently there are some use >> cases that require votes != 1. > > Can someone enumerate a couple? Maybe they're valid, maybe they're not. Lon/David need to pich in here. Lon gave me an example with some magic numbers but I keep forgetting to write it down. > >>>> If NO, then votes: can >>>> simply be dropped from corosync.conf defaults, and in case an override >>>> is necessary, it can be done specific to the node. This solution poses >>>> the problem that expected_votes need to be set in corosync.conf (one >>>> liner in the config file vs different liners) but it might be slightly >>>> more tricky to calculate if votes are not balanced. >>> >>> Any chance the value could be incremented based on the number of nodes >>> ever seen? >>> Ie. if count(active peers) > expected votes, update the config file. >> >> expected_votes is already calculated that way. If you configure 8 but >> all of a sudden you see 9 nodes, then expected_votes is incremented. >> The above is true also if one node starts voting differently (1 -> X) >> then expected_votes is updated across the cluster automagically. >> Writing to file is unnecessary operation with votequorum current >> incarnation. > > I'm not sure about that. > If it was 3 and gets runtime bumped to 5, then two of the original 3 > could come back up thinking they have quorum (at the same time the > remaining 3 legitimately retain quorum). > > Or am I missing something? I would expects admins to update corosync.conf as node counts increase, but the automatic increase is there as fail safe. At the same time, when a node joins a running cluster, even if it has expected votes set to 1, it would receive the highest expected vote in the cluster from the other nodes. Yes, it doesn´t protect against stupid user errors that will not increase the expected votes and that partition case. That would make "write to config file" a good thing, but I doubt corosync has that option right now. > >> >> >>> >>> That way most people could simply ignore the setting until they wanted >>> to remove a node. >> >> Not that simple no. >> >> There are several cases where expected_votes is required to be known >> upfront specially when handling partitions and startups. >> >> Let say you have 8 nodes cluster. quorum expected to be 5. > > Err. Why would you ever do that? And wouldn't the above logic bump it > to 8 at runtime? Uh? 8 / 2 + 1 = 5 If I expect 8 nodes, 1 vote each, quorum is 5. expected_votes != quorum. expected votes is the highest number of votes in a cluster. > >> Switch between 4 nodes and 4 nodes is dead or mulfunctioning. By using >> an incremental expected_votes, you can effectively start 2 clusters. > > You can, but you'd probably stop after the 5th node didn't join the first four. > Because if you're writing the highest value back to corosync.conf then > the only time you could hit this situation is on first cluster boot Right assuming you write that value back to corosync.conf, I agree, but that also implies that you have seen all cluster nodes up at once at least one time. At the end, I think it´s a lot safer to just know expected_votes upfront and a lot less complicated for the user to bring the cluster up. > (and you don't bring up all members of a brand new cluster all at > once). ehhh we can´t assume that. customers do that and we have seen bugs related to this condition. > >> Both clusters would be quorate, with expected_votes set to 4 and quorum >> to 3. No guarantee those will merge. I doubt we want this situation to >> ever exists. >> >> also, it would break the wait_for_all feature (or WFA would need to >> require expected_votes .. either way). > > Again, it only affects the first time you bring up the cluster. > After that, expected_votes would have been (auto) set correctly and > wait_for_all would work as expected. > wait_for_all is only useful when you bring the cluster up for the very first time... the two options conflict. Fabio _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss