On 2004-08-11T14:24:49, Daniel McNeil <daniel@xxxxxxxx> said: > How can the DLM go to Andrew without a membership layer to > provide membership? I'd agree with this question. Membership is really the first and foremost question, then the DLM can go in. Fencing turns out to be a more difficult beast, because the way how the GFS stack handles it's recovery (a static priority list) is somewhat fundamentally incompatible with the way how a more powerful dependency based cluster recovery manager might wish to handle things. We've just run into this discussion ourselves, and as soon as we have an idea, will propose that adequately for discussion... > I think John really does mean communication. For high availability, > the cluster should have no single point of failure. Exposing the communication APIs begs a ton of questions regarding the semantics; atomic, causal or total ordering?; communication groups; access controls to those; sync or async; broadcast, multicast or pair-wise channels? All of these and some more can/should be supported, however most systems just provide subsets. How to expose that, how to handle it? That's a bit more difficult than answering the question about membership, which is even complex enough - do you get to see membership before or after fencing, with or without quorum etc. Don't rush this. Don't get sidetracked. (And trust me, I've been there at OCF for that one.) Concentrate on the slightly more palatable ones like membership and DLM, and after we've established prior art, then lets tackle the bigger issues. Nobody denies that communication, recovery coordination etc are required and very important, just that we don't wish to start there. > Does CMAN provide this kind of functionality? If so, then it > really is a communication service. It provides a very limitted subset of it which is, for example, not even useable to the low requirements SCRAT (heartbeat's new recovery/resource manager) has, as far as I can see, because it's not performing well enough. And it's not meant to, because they architect their stack differently (around DLM + TCP etc), but it means we'll need to work on this area some more first. Sincerely, Lars Marowsky-Brée <lmb@xxxxxxx> -- High Availability & Clustering \ Philosophy proclaiming reason to be SUSE Labs, Research and Development | the supreme human virtue is falling SUSE LINUX AG - A Novell company \ prey to self-adulation.