Re: [RFC] quorum module configuration bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



whoops.. i missed this email on the list.... sorry about this delay.

On 1/14/2012 4:32 PM, Vladislav Bogdanov wrote:
> 14.01.2012 11:57, Fabio M. Di Nitto wrote:
>> On 01/14/2012 09:09 AM, Vladislav Bogdanov wrote:
>>> Hi,
>>>
>>> 13.01.2012 21:21, Fabio M. Di Nitto wrote:
>>> [snip]
>>>>> + expected_votes is removed and instead auto-calculated based upon
>>>>> quorum_votes in the node list
>>>
>>> Is it possible to dynamically keep track of "seen" nodes here and use
>>> only that nodes for expected_votes calculations?
>>>
>>> I even have a use-case for that:
>>> I "run" cluster consisting of max 17 nodes with UDPU, so all nodes are
>>> listed in config. Currently only 3 nodes are powered on, because I do
>>> not have load which requires more yet (and power is expensive in
>>> european datacenters). When load increases I'd just power on additional
>>> nodes and quorum expectations are recalculated automagically.
>>> I have that implemented with corosync + pacemaker right now. Pacemaker
>>> keeps that list of nodes and does quorum calculations correctly. And I'm
>>> absolutely happy with that. From what I see changes being discussed will
>>> break my happiness.
>>
>> Yes and no. Let me explain:
>>
>> votequorum already does that internally. For example:
>>
>> expected_votes: 3 in corosync.conf
>>
>> you power on your 4th node (assuming everybody votes 1 to make it
>> simpler in this example) and expected_votes is automatically bumped to 4
>> on all nodes.
>>
>> While this is what you are asking for, there are a few corner cases
>> where this could lead to a dangerous situation.
>>
>> First of all the new expected_votes is not written to disk but only
>> retained internally to votequorum.
> 
> Actually I'd prefer it to be written together with that "seen" list, so
> cluster knows who should be here even after a full restart. But I
> neither like idea having it in a config nor being calculated based on a
> nodelist. From my point of view, that is not a configuration variable,
> but rather a "state" one. And it should be managed in a stateful way
> (saved to disk),
> 
>>
>> This approach does not protects you against partitions properly.
>> Specially at startup of the cluster. For example, out of 16 nodes, 8 are
>> on switch A and 8 on switch B. Interlink between switches is broken. All
>> nodes know of expected_votes: 3 from corosync.conf
> 
> If we have expected_votes not in config, but in some file in /var/lib
> (corosync already does the same for rings), and managed dynamically
> cluster-wide, that should be impossible (of course if admin didn't
> delete that file on all nodes).
> Cluster knows it has 16 active nodes. It even knows all its ever "seen"
> members.

I have been thinking about writing it to disk too, but i didn´t come up
with a full solution/implementation. Tho I am still considering all
benefits and what not of this approach.


>>>
>>> It would also be great if I'm able to forcibly remove inactive node from
>>> that "seen" list with just one command on *one* cluster node. Use case
>>> for that is a human error when wrong node is powered on by mistake.
>>
>> The "seen" list within the quorum module is dynamic. As soon as you
>> shutdown a node (either clean or whatever) and totem notices that the
>> node goes away, that same node is removed from the quorum calculation.
> 
> Ugh?
> Do you mean that dynamic version of expected_votes is decremented
> automatically?

No no! not by default at least.

There is a new feature called last_man_standing that can do that for
you, but give me one more day to complete the man page to explain it in
details.

> 
>>
>> Your concern is probably related to the discussed nodelist, but that's
>> up to others to decide "how" to handle add/removal of nodes. It doesn't
>> affect the quorum module at all.
> 
> Generally yes, it is about nodelist, vote-list and expected_votes in a
> config file.
> 
> All I wanted to say is that I'm pretty happy with how pacemaker
> implements quorum management (from admin's point of view).
> 
> If I power on more "unseen" nodes, expected_votes is automatically
> incremented and saved into CIB. If I then power down that nodes, their
> votes are still considered until I remove them from CIB and decrement
> expected_votes manually (actually that part didn't fully work last time
> I checked).

You can do this with votequorum already. New nodes are added to the list
and expected_votes increases automagically. On removal, you will need to
manually decrement expected_votes as you do now pretty much.

> 
> And I do not like idea of touching configuration file every time I want
> to add node to cluster. And then re-distributing that config over all
> nodes, and then reload it on every node.
> 
> Now I have all 17 nodes listed in corosync.conf (UDPU), but my
> expected_votes in pacemaker CIB is 3. That's why Steve's idea of
> calculating expected_votes from a vote-list would be a regression for me.

Ok, I´ll make it happen by allow expected_votes: XX overrides the
calculated one from the nodelist.

You will be able to change expected_votes at runtime if necessary (up or
down) and new "seen" nodes will automatically increase the expected_votes.

I think this should fulfill your requirements.

Fabio
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux