Re: [RFC] quorum module configuration bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/14/2012 09:09 AM, Vladislav Bogdanov wrote:
> Hi,
> 
> 13.01.2012 21:21, Fabio M. Di Nitto wrote:
> [snip]
>>> + expected_votes is removed and instead auto-calculated based upon
>>> quorum_votes in the node list
> 
> Is it possible to dynamically keep track of "seen" nodes here and use
> only that nodes for expected_votes calculations?
> 
> I even have a use-case for that:
> I "run" cluster consisting of max 17 nodes with UDPU, so all nodes are
> listed in config. Currently only 3 nodes are powered on, because I do
> not have load which requires more yet (and power is expensive in
> european datacenters). When load increases I'd just power on additional
> nodes and quorum expectations are recalculated automagically.
> I have that implemented with corosync + pacemaker right now. Pacemaker
> keeps that list of nodes and does quorum calculations correctly. And I'm
> absolutely happy with that. From what I see changes being discussed will
> break my happiness.

Yes and no. Let me explain:

votequorum already does that internally. For example:

expected_votes: 3 in corosync.conf

you power on your 4th node (assuming everybody votes 1 to make it
simpler in this example) and expected_votes is automatically bumped to 4
on all nodes.

While this is what you are asking for, there are a few corner cases
where this could lead to a dangerous situation.

First of all the new expected_votes is not written to disk but only
retained internally to votequorum.

This approach does not protects you against partitions properly.
Specially at startup of the cluster. For example, out of 16 nodes, 8 are
on switch A and 8 on switch B. Interlink between switches is broken. All
nodes know of expected_votes: 3 from corosync.conf

Both partitions of the cluster can achieve quorate status and they can
create caos fencing each other, data corruption and all. Now, we agree
that this is generally an admin error that doesn't notice that the
interlink is down, but.. it leaves a window open for disasters.

On the other side, i am not going to force users to do it differently.
Current votequorum implementation allows this use case, and i am not
going to enforce differently. Users should still be aware of they are
asking for tho.

> 
> It would also be great if I'm able to forcibly remove inactive node from
> that "seen" list with just one command on *one* cluster node. Use case
> for that is a human error when wrong node is powered on by mistake.

The "seen" list within the quorum module is dynamic. As soon as you
shutdown a node (either clean or whatever) and totem notices that the
node goes away, that same node is removed from the quorum calculation.

Your concern is probably related to the discussed nodelist, but that's
up to others to decide "how" to handle add/removal of nodes. It doesn't
affect the quorum module at all.

Fabio

> 
> Best,
> Vladislav
> 
>>> + votes is moved to the individual node list
>>
>> I will only speak for quorum:
>>
>> quorum itself doesn't need quorum_votes. It is optional (like David
>> already mentioned). default to 1.
>>
>> quorum doesn't care about nodeid in general. A list of nodeid makes
>> auto-tie-breaker working a bit earlier in the first cluster bootstrap
>> process, but it's nothing worth going crazy for.
>>
>> Requiring a list is not mandatory either for quorum operations.
>>
>> I suggest to keep it flexible instead.  Not everybody wants or need a
>> nodelist (mcast/bcast).
>>
>> I suggest that if nodelist is available quorum uses it by default.
>> If the list is not available, then we want expected_votes.
>>
>> If neither are available we error out, if both are available the list
>> has higher priority vs expected_votes setting.
>>
>> I personally have no opinion on how the list is structured as long as I
>> can easily reiterate through the node list and be able to find out which
>> node I am in that list (specially if nodeid are not specified).
>>
>> Fabio
>> _______________________________________________
>> discuss mailing list
>> discuss@xxxxxxxxxxxx
>> http://lists.corosync.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux