Re: [RFC] quorum module configuration bits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



10.02.2012 11:55, Fabio M. Di Nitto wrote:
> On 2/10/2012 9:14 AM, Vladislav Bogdanov wrote:
>> [snip for readability just to highlight one idea]
> 
> wfm ;)
> 
>>>>
>>>> Either way, internally, i don´t need to exchange the list of seen nodes
>>>> because either the nodelist from corosync.conf _or_ the calculation
>>>> request will tell me what to do.
>>>
>>> For me it is always preferred to have important statements listed
>>> explicitly. Implicit ones always leave chance to be interpreted incorrectly,
>>>
>>> Look:
>>> "You have cluster of max 8 nodes with max 10 votes, and 4 of them with 5
>>> votes are known to be active. I wont say which ones, just trust me."
>>>
>>> "You have cluster of max 8 nodes, and nodes A, B, C, D are active. Nodes
>>> E, F, G, H are not active. A and E has two votes each, all others have
>>> one vote each."
>>>
>>> I would always prefer latter statement.
>>> (This example has nothing to split-brain discussion, just an implicit
>>> vs. explicit example)
>>>
>> [snip]
>>>
>>> I'd also some-how recommend that even with redundant ring cluster should
>>> never be put into a "undetermined" state by powering-off old partition,
>>> powering-on new one and then powering-on old one again.
>>> Do not know why, but I feel that dangerous. May be my feeling is not valid.
>>
>> Just to become synchronized.
>>
>> Taking the example above:
>> You have ABCD running, 4 nodes 5 votes. expected_votes is 5,
>> higher_ever_seen is 5.
> 
> correct.
> 
>> You shutdown ABCD and then poweron EFGH. Cluster runs with 4 nodes 5
>> votes. expected_votes is 5, higher_ever_seen is 5.
> 
> If the shutdown and power on are done in two distinct stages (first
> complete shutdown and then poweron), then yes, that´s correct.

Yes, I meant that.

> 
>> You poweron A.
>>
>> What would be the correct final expected_votes value?
> 
> It only depends on what A votes (you don´t say in the above example ;))

"A and E has two votes each"

> 
> If A votes 1, then you get expected_votes: 6, higher_ever_seen: 6.
> 2 votes, then you get 7/7 (to state the obvious)
> 
>> It would be 7 with you approach and 10 with "seen" list
> 
> ABCD have never "seen" EFGH before but now EFGH can see A. So it´s
> either 6 or 7 (based on A votes and current implementation).

Understand your point. Just wanted to know your opinion.

> 
> But there is still an issue with the seen list when you move a bit away
> from this example.
> 
> 10 nodes (all votes 1)
> 
> ABCDEFGHJK
> 
> ABCDEF running.
> ev:6 hes: 6
> 
> shutdown ABCDEF
> (dunno why you would do that, but customers and users do the strangest
> things)

;)

> 
> poweron GHJK
> ev: 4 hes: 4
> 
> poweron A
> ev: 10 hes: 10 total_votes in the cluster 5 < quorum 6 -> KABOOM?

Not really, I'd expect that. And that was a major reason for me to ask
"what is the right behavior".

My idea was it that ev and quorum are modified according to new member's
point of view. So, if A knows BCDEF, then the whole cluster should know
them unless A's persistent data is cleaned manually (?).

(GHJK enter)
G: Hello guys HJK, we are four here, and three of us are enough to make
decisions.
HJK: ack
(GHJK are doing something)
(A enters)
A: Oh, no, please wait, I know that we also have BCDEF somewhere here,
so please postpone any actions until they arrive because they may have a
different vision on what to do. This way you still have a chance to not
break something valuable!
(your scenario)
GHJK: nope
(my scenario)
GHJK: ack

Anyways, this is just to decide what is safer, just throw previous
membership information away, or use the biggest known set of members.

And, I do not know which scenario is actually better (or just
"expected") when it comes to major upper layer consumers (e.g.
pacemaker, dlm). For example, I do not know what would node-list in
pacemaker's CIB look like after such scenario finishes. For me it would
be great if both quorum engine and pacemaker have a consensus on "whom
do we know here".

Maybe Andrew and David can comment (I added you guys to CC)?

> 
>> (assuming we do
>> not have leave_remove active, otherwise it may vary from 7 to 10,
>> depending on order in which ABCD have left the cluster).
> 
> Let´s put aside leave_remove for now, it does not affect
> highest_ever_seen as-is now and that integration bit is still missing
> even from my head. Let´s see if we can come down to a correct ehs
> handling, then we can take a look at integrating with other features.
> 
>> But which of them is a correct one?
> 
> I guess it´s up to us to define what is correct.
> 
> So far "seen" for me means that a certain node has seen another node
> live at least once (after that I can track the state).

I'd say "seen" means that node knew some other node to be an active
cluster member last time that first node was active.

Best,
Vladislav
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux