whoops.. i missed this email on the list.... sorry about this delay. On 1/14/2012 4:32 PM, Vladislav Bogdanov wrote: > 14.01.2012 11:57, Fabio M. Di Nitto wrote: >> On 01/14/2012 09:09 AM, Vladislav Bogdanov wrote: >>> Hi, >>> >>> 13.01.2012 21:21, Fabio M. Di Nitto wrote: >>> [snip] >>>>> + expected_votes is removed and instead auto-calculated based upon >>>>> quorum_votes in the node list >>> >>> Is it possible to dynamically keep track of "seen" nodes here and use >>> only that nodes for expected_votes calculations? >>> >>> I even have a use-case for that: >>> I "run" cluster consisting of max 17 nodes with UDPU, so all nodes are >>> listed in config. Currently only 3 nodes are powered on, because I do >>> not have load which requires more yet (and power is expensive in >>> european datacenters). When load increases I'd just power on additional >>> nodes and quorum expectations are recalculated automagically. >>> I have that implemented with corosync + pacemaker right now. Pacemaker >>> keeps that list of nodes and does quorum calculations correctly. And I'm >>> absolutely happy with that. From what I see changes being discussed will >>> break my happiness. >> >> Yes and no. Let me explain: >> >> votequorum already does that internally. For example: >> >> expected_votes: 3 in corosync.conf >> >> you power on your 4th node (assuming everybody votes 1 to make it >> simpler in this example) and expected_votes is automatically bumped to 4 >> on all nodes. >> >> While this is what you are asking for, there are a few corner cases >> where this could lead to a dangerous situation. >> >> First of all the new expected_votes is not written to disk but only >> retained internally to votequorum. > > Actually I'd prefer it to be written together with that "seen" list, so > cluster knows who should be here even after a full restart. But I > neither like idea having it in a config nor being calculated based on a > nodelist. From my point of view, that is not a configuration variable, > but rather a "state" one. And it should be managed in a stateful way > (saved to disk), > >> >> This approach does not protects you against partitions properly. >> Specially at startup of the cluster. For example, out of 16 nodes, 8 are >> on switch A and 8 on switch B. Interlink between switches is broken. All >> nodes know of expected_votes: 3 from corosync.conf > > If we have expected_votes not in config, but in some file in /var/lib > (corosync already does the same for rings), and managed dynamically > cluster-wide, that should be impossible (of course if admin didn't > delete that file on all nodes). > Cluster knows it has 16 active nodes. It even knows all its ever "seen" > members. I have been thinking about writing it to disk too, but i didn´t come up with a full solution/implementation. Tho I am still considering all benefits and what not of this approach. >>> >>> It would also be great if I'm able to forcibly remove inactive node from >>> that "seen" list with just one command on *one* cluster node. Use case >>> for that is a human error when wrong node is powered on by mistake. >> >> The "seen" list within the quorum module is dynamic. As soon as you >> shutdown a node (either clean or whatever) and totem notices that the >> node goes away, that same node is removed from the quorum calculation. > > Ugh? > Do you mean that dynamic version of expected_votes is decremented > automatically? No no! not by default at least. There is a new feature called last_man_standing that can do that for you, but give me one more day to complete the man page to explain it in details. > >> >> Your concern is probably related to the discussed nodelist, but that's >> up to others to decide "how" to handle add/removal of nodes. It doesn't >> affect the quorum module at all. > > Generally yes, it is about nodelist, vote-list and expected_votes in a > config file. > > All I wanted to say is that I'm pretty happy with how pacemaker > implements quorum management (from admin's point of view). > > If I power on more "unseen" nodes, expected_votes is automatically > incremented and saved into CIB. If I then power down that nodes, their > votes are still considered until I remove them from CIB and decrement > expected_votes manually (actually that part didn't fully work last time > I checked). You can do this with votequorum already. New nodes are added to the list and expected_votes increases automagically. On removal, you will need to manually decrement expected_votes as you do now pretty much. > > And I do not like idea of touching configuration file every time I want > to add node to cluster. And then re-distributing that config over all > nodes, and then reload it on every node. > > Now I have all 17 nodes listed in corosync.conf (UDPU), but my > expected_votes in pacemaker CIB is 3. That's why Steve's idea of > calculating expected_votes from a vote-list would be a regression for me. Ok, I´ll make it happen by allow expected_votes: XX overrides the calculated one from the nodelist. You will be able to change expected_votes at runtime if necessary (up or down) and new "seen" nodes will automatically increase the expected_votes. I think this should fulfill your requirements. Fabio _______________________________________________ discuss mailing list discuss@xxxxxxxxxxxx http://lists.corosync.org/mailman/listinfo/discuss