Re: [RFC] quorum module configuration bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 11, 2012 at 4:50 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote:
> On 01/10/2012 11:47 PM, Andrew Beekhof wrote:
>> On Tue, Jan 10, 2012 at 9:08 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote:
>>> Hi all,
>>>
>>> in some recent discussions, it come up the issue on how to configure
>>> quorum module. As I don´t really have a complete solution yet, I need to
>>> seek advice in the community :)
>>>
>>> Problem:
>>>
>>> it would be very nice if corosync.conf could be simply scp´ed/copied
>>> between nodes and everything works as expected on all nodes.
>>> Issue being that some quorum bits are, at this point in time, node
>>> specific. It means that to alter some values, it is necessary to edit
>>> corosync.conf on the specific node.
>>> On top of that, it would be nice if expected_votes could be
>>> automatically calculated based on votes: values.
>>>
>>> The current quorum configuration (based on topic-quorum patches):
>>>
>>> quorum {
>>>    provider: corosync_votequorum
>>>    expected_votes: 8
>>>    votes: 1
>>>    two_node: 0
>>>    wait_for_all: 0
>>>    last_man_standing: 0
>>>    auto_tie_breaker: 0
>>> }
>>>
>>> totem {
>>>    nodeid: xxx
>>> }
>>>
>>> The 2 values that cannot be copied around are quorum.votes and totem.nodeid.
>>>
>>> In current votequorum/totem incarnation, votes/expected_votes/nodeid are
>>> all broadcasted to all nodes. so each node that joins the cluster
>>> becomes aware of the other peers values.
>>>
>>> As a consequence of the current config format, auto_tie_breaker feature,
>>> requires wait_for_all to work (in order to have the complete list of
>>> nodeids, see auto_tie_breaker implementation in topic-quorum branch for
>>> details).
>>>
>>> Honza and I quickly explored options to add those values into the node
>>> list of udpu, but that´s limiting because it doesn´t work well in
>>> multicast and/or broadcast and it has integration issues with RRP.
>>>
>>> Also adding lists to quorum {} involves a certain level of duplicated
>>> information.
>>>
>>> For example:
>>>
>>> quorum {
>>>   nodeid_list: x y z...
>>>   node.x.votes: ..
>>>   node.y.votes: ..
>>> }
>>>
>>> that IMHO is all but nice to look at.
>>>
>>> So the question of changing the config format also raise the following
>>> questions:
>>>
>>> 1) do we really need to support an auto_tie_breaker feature without
>>> wait_for_all? if NO, then we don´t need the list of nodeids upfront.
>>>
>>> 2) do we really care about votes other than 1?
>>
>> That was also my question when reading the above.
>> It always struck me as troublesome to get right, just giving one of 4
>> nodes an extra vote (for example) will still give you a tie under the
>> wrong conditions.
>>
>> Seems (to me) like a habit people got into when clusters went to
>> pieces without quorum and that we have "better" solutions today (like
>> the token registry).
>> So my vote is drop it.
>
> That was my take too in the beginning but apparently there are some use
> cases that require votes != 1.

Can someone enumerate a couple?  Maybe they're valid, maybe they're not.

>>> If NO, then votes: can
>>> simply be dropped from corosync.conf defaults, and in case an override
>>> is necessary, it can be done specific to the node. This solution poses
>>> the problem that expected_votes need to be set in corosync.conf (one
>>> liner in the config file vs different liners) but it might be slightly
>>> more tricky to calculate if votes are not balanced.
>>
>> Any chance the value could be incremented based on the number of nodes
>> ever seen?
>> Ie. if count(active peers) > expected votes, update the config file.
>
> expected_votes is already calculated that way. If you configure 8 but
> all of a sudden you see 9 nodes, then expected_votes is incremented.
> The above is true also if one node starts voting differently (1 -> X)
> then expected_votes is updated across the cluster automagically.
> Writing to file is unnecessary operation with votequorum current
> incarnation.

I'm not sure about that.
If it was 3 and gets runtime bumped to 5, then two of the original 3
could come back up thinking they have quorum (at the same time the
remaining 3 legitimately retain quorum).

Or am I missing something?

>
>
>>
>> That way most people could simply ignore the setting until they wanted
>> to remove a node.
>
> Not that simple no.
>
> There are several cases where expected_votes is required to be known
> upfront specially when handling partitions and startups.
>
> Let say you have 8 nodes cluster. quorum expected to be 5.

Err. Why would you ever do that?  And wouldn't the above logic bump it
to 8 at runtime?

> Switch between 4 nodes and 4 nodes is dead or mulfunctioning. By using
> an incremental expected_votes, you can effectively start 2 clusters.

You can, but you'd probably stop after the 5th node didn't join the first four.
Because if you're writing the highest value back to corosync.conf then
the only time you could hit this situation is on first cluster boot
(and you don't bring up all members of a brand new cluster all at
once).

> Both clusters would be quorate, with expected_votes set to 4 and quorum
> to 3. No guarantee those will merge. I doubt we want this situation to
> ever exists.
>
> also, it would break the wait_for_all feature (or WFA would need to
> require expected_votes .. either way).

Again, it only affects the first time you bring up the cluster.
After that, expected_votes would have been (auto) set correctly and
wait_for_all would work as expected.

>
> Fabio
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux