Re: [RFC] quorum module configuration bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/30/2012 1:51 PM, Andrew Beekhof wrote:
> On Mon, Jan 30, 2012 at 11:31 PM, Fabio M. Di Nitto <fdinitto@xxxxxxxxxx> wrote:
>> On 1/27/2012 10:46 PM, Vladislav Bogdanov wrote:
>>> 26.01.2012 15:41, Fabio M. Di Nitto wrote:
>>>> On 1/26/2012 1:15 PM, Vladislav Bogdanov wrote:
>>>>
>>>>>>>> Probably even not lower than number of votes from nodes which are now
>>>>>>>> either active or inactive but joined at least once (I suppose that
>>>>>>>> nodelist is fully editable at runtime, so admin may some-how reset join
>>>>>>>> count of node and only than reduce expected_votes).
>>>>>>
>>>>>> I have been thinking about this some more, but I am not sure I grasp the
>>>>>> use case or what kind of protection you try to suggest.
>>>>>>
>>>>>> Reducing the number of expected_votes is an admin action, it´s not very
>>>>>> different from removing a node from the "seen" list manually and
>>>>>> recalculating expected_votes.
>>>>>>
>>>>>> Can you clarify it for me?
>>>>>
>>>>> Imagine (this case is a little bit hypothetical, but anyways):
>>>>> * You have cluster with 8 active nodes, and you (for some historical
>>>>> reasons or due to admin fault/laziness) have expected_votes set to 3
>>>>> (ok, you had 3-node cluster not so long ago, but added more nodes
>>>>> because of growing load).
>>>>> * Cluster splits 5+3 due to loss of communication between switches (or
>>>>> switch-stacks).
>>>>> * 3 nodes are fenced.
>>>>> * Partition with majority continues operation.
>>>>> * 3 fenced nodes boot back, and form *quorate* partition because they
>>>>> have expected_votes set to 3
>>>>> * Data is corrupted
>>>>>
>>>>> If fenced nodes know right after boot that cluster consists of 8 active
>>>>> nodes, they would not override expected_votes obtained from the
>>>>> persistent "seen" list with the lower value from the config, and the
>>>>> data is safe.
>>>>
>>>> Oh great.. yes I see where you are going here. It sounds an interesting
>>>> approach but that clearly requires a file where to store those information.
>>>
>>> I do not see a big problem here...
>>> Corosync saves its ring persistently anyways.
>>>
>>>>
>>>> There is still a window where the file containing the expected_votes
>>>> from "seen" list is corrupted tho. At that point it´s difficult to
>>>> detect which of the two information is correct and it doesn´t prevent
>>>> the issue at all if the file is removed entirely (even by accident), but
>>>> at a first shot i would say that it is better than nothing.
>>>
>>> Hopefully at least not all nodes from a fenced partition will have it
>>> corrupted/deleted. They should honor the maximal ev value from them all.
>>
>> Right, I am just a bit conservative and maybe I apply extreme caution :)
>>
>>>
>>>>
>>>> I´ll have a test and see how it pans out but at a first glance I think
>>>> we should only store the last known expected_votes while quorate.
>>>> The node booting would use the higher of the two values. If the cluster
>>>> has decreased in size in the meantime, the node joining would be
>>>> informed about it (just sent a patch to the list about it 10 minutes ago ;))
>>>
>>> I'd argue that you do not know who is the last known (or ever known)
>>> active then.
>>>
>>> Dynamically handled persistent list is much better from this point of
>>> view. At it resembles what pacemaker does right now. This is probably
>>> the major value for me.
>>
>> Ok hold on a sec here, i think there is a basic misunderstanding :)...
>> you won´t be forced to use votequorum. And votequorum only provides
>> simple majority quorum with some extra feature.
>>
>> Dynamic quorum is not part of it. votequorum has some features that
>> allows you to upscale (dynamically) or downscale (manually) the cluster.
>>
>> You can decide to opt out from using votequorum and retain current
>> pacemaker behavior as is now so in fact, there would be no regression at
>> all for you.
> 
> As per irc, this isn't an option.
> The part of pacemaker that did this was loaded inside corosync as a
> plugin, which isn't allowed anymore.
> 

Yes, thanks for the clarification.

I still believe that linear/dynamic is part of ykd module and out of
scope for votequorum (simple majority).

Fabio
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux