Re: [RFC] quorum module configuration bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/27/2012 10:46 PM, Vladislav Bogdanov wrote:
> 26.01.2012 15:41, Fabio M. Di Nitto wrote:
>> On 1/26/2012 1:15 PM, Vladislav Bogdanov wrote:
>>
>>>>>> Probably even not lower than number of votes from nodes which are now
>>>>>> either active or inactive but joined at least once (I suppose that
>>>>>> nodelist is fully editable at runtime, so admin may some-how reset join
>>>>>> count of node and only than reduce expected_votes).
>>>>
>>>> I have been thinking about this some more, but I am not sure I grasp the
>>>> use case or what kind of protection you try to suggest.
>>>>
>>>> Reducing the number of expected_votes is an admin action, it´s not very
>>>> different from removing a node from the "seen" list manually and
>>>> recalculating expected_votes.
>>>>
>>>> Can you clarify it for me?
>>>
>>> Imagine (this case is a little bit hypothetical, but anyways):
>>> * You have cluster with 8 active nodes, and you (for some historical
>>> reasons or due to admin fault/laziness) have expected_votes set to 3
>>> (ok, you had 3-node cluster not so long ago, but added more nodes
>>> because of growing load).
>>> * Cluster splits 5+3 due to loss of communication between switches (or
>>> switch-stacks).
>>> * 3 nodes are fenced.
>>> * Partition with majority continues operation.
>>> * 3 fenced nodes boot back, and form *quorate* partition because they
>>> have expected_votes set to 3
>>> * Data is corrupted
>>>
>>> If fenced nodes know right after boot that cluster consists of 8 active
>>> nodes, they would not override expected_votes obtained from the
>>> persistent "seen" list with the lower value from the config, and the
>>> data is safe.
>>
>> Oh great.. yes I see where you are going here. It sounds an interesting
>> approach but that clearly requires a file where to store those information.
> 
> I do not see a big problem here...
> Corosync saves its ring persistently anyways.
> 
>>
>> There is still a window where the file containing the expected_votes
>> from "seen" list is corrupted tho. At that point it´s difficult to
>> detect which of the two information is correct and it doesn´t prevent
>> the issue at all if the file is removed entirely (even by accident), but
>> at a first shot i would say that it is better than nothing.
> 
> Hopefully at least not all nodes from a fenced partition will have it
> corrupted/deleted. They should honor the maximal ev value from them all.

Right, I am just a bit conservative and maybe I apply extreme caution :)

> 
>>
>> I´ll have a test and see how it pans out but at a first glance I think
>> we should only store the last known expected_votes while quorate.
>> The node booting would use the higher of the two values. If the cluster
>> has decreased in size in the meantime, the node joining would be
>> informed about it (just sent a patch to the list about it 10 minutes ago ;))
> 
> I'd argue that you do not know who is the last known (or ever known)
> active then.
>
> Dynamically handled persistent list is much better from this point of
> view. At it resembles what pacemaker does right now. This is probably
> the major value for me.

Ok hold on a sec here, i think there is a basic misunderstanding :)...
you won´t be forced to use votequorum. And votequorum only provides
simple majority quorum with some extra feature.

Dynamic quorum is not part of it. votequorum has some features that
allows you to upscale (dynamically) or downscale (manually) the cluster.

You can decide to opt out from using votequorum and retain current
pacemaker behavior as is now so in fact, there would be no regression at
all for you.

Also, dynamic quorum will be handled as part of the ykd implementation
that is aimed to solve those cases you mentioned in a more specific way.

Some of your ideas are still valid and I am not going to forget about
them or anything, but we need to put them into a slightly different context.

Fabio
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux