10.02.2012 11:55, Fabio M. Di Nitto wrote: > On 2/10/2012 9:14 AM, Vladislav Bogdanov wrote: >> [snip for readability just to highlight one idea] > > wfm ;) > >>>> >>>> Either way, internally, i don´t need to exchange the list of seen nodes >>>> because either the nodelist from corosync.conf _or_ the calculation >>>> request will tell me what to do. >>> >>> For me it is always preferred to have important statements listed >>> explicitly. Implicit ones always leave chance to be interpreted incorrectly, >>> >>> Look: >>> "You have cluster of max 8 nodes with max 10 votes, and 4 of them with 5 >>> votes are known to be active. I wont say which ones, just trust me." >>> >>> "You have cluster of max 8 nodes, and nodes A, B, C, D are active. Nodes >>> E, F, G, H are not active. A and E has two votes each, all others have >>> one vote each." >>> >>> I would always prefer latter statement. >>> (This example has nothing to split-brain discussion, just an implicit >>> vs. explicit example) >>> >> [snip] >>> >>> I'd also some-how recommend that even with redundant ring cluster should >>> never be put into a "undetermined" state by powering-off old partition, >>> powering-on new one and then powering-on old one again. >>> Do not know why, but I feel that dangerous. May be my feeling is not valid. >> >> Just to become synchronized. >> >> Taking the example above: >> You have ABCD running, 4 nodes 5 votes. expected_votes is 5, >> higher_ever_seen is 5. > > correct. > >> You shutdown ABCD and then poweron EFGH. Cluster runs with 4 nodes 5 >> votes. expected_votes is 5, higher_ever_seen is 5. > > If the shutdown and power on are done in two distinct stages (first > complete shutdown and then poweron), then yes, that´s correct. Yes, I meant that. > >> You poweron A. >> >> What would be the correct final expected_votes value? > > It only depends on what A votes (you don´t say in the above example ;)) "A and E has two votes each" > > If A votes 1, then you get expected_votes: 6, higher_ever_seen: 6. > 2 votes, then you get 7/7 (to state the obvious) > >> It would be 7 with you approach and 10 with "seen" list > > ABCD have never "seen" EFGH before but now EFGH can see A. So it´s > either 6 or 7 (based on A votes and current implementation). Understand your point. Just wanted to know your opinion. > > But there is still an issue with the seen list when you move a bit away > from this example. > > 10 nodes (all votes 1) > > ABCDEFGHJK > > ABCDEF running. > ev:6 hes: 6 > > shutdown ABCDEF > (dunno why you would do that, but customers and users do the strangest > things) ;) > > poweron GHJK > ev: 4 hes: 4 > > poweron A > ev: 10 hes: 10 total_votes in the cluster 5 < quorum 6 -> KABOOM? Not really, I'd expect that. And that was a major reason for me to ask "what is the right behavior". My idea was it that ev and quorum are modified according to new member's point of view. So, if A knows BCDEF, then the whole cluster should know them unless A's persistent data is cleaned manually (?). (GHJK enter) G: Hello guys HJK, we are four here, and three of us are enough to make decisions. HJK: ack (GHJK are doing something) (A enters) A: Oh, no, please wait, I know that we also have BCDEF somewhere here, so please postpone any actions until they arrive because they may have a different vision on what to do. This way you still have a chance to not break something valuable! (your scenario) GHJK: nope (my scenario) GHJK: ack Anyways, this is just to decide what is safer, just throw previous membership information away, or use the biggest known set of members. And, I do not know which scenario is actually better (or just "expected") when it comes to major upper layer consumers (e.g. pacemaker, dlm). For example, I do not know what would node-list in pacemaker's CIB look like after such scenario finishes. For me it would be great if both quorum engine and pacemaker have a consensus on "whom do we know here". Maybe Andrew and David can comment (I added you guys to CC)? > >> (assuming we do >> not have leave_remove active, otherwise it may vary from 7 to 10, >> depending on order in which ABCD have left the cluster). > > Let´s put aside leave_remove for now, it does not affect > highest_ever_seen as-is now and that integration bit is still missing > even from my head. Let´s see if we can come down to a correct ehs > handling, then we can take a look at integrating with other features. > >> But which of them is a correct one? > > I guess it´s up to us to define what is correct. > > So far "seen" for me means that a certain node has seen another node > live at least once (after that I can track the state). I'd say "seen" means that node knew some other node to be an active cluster member last time that first node was active. Best, Vladislav _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss