David my apologies for not responding earlier I think I must have missed this post comments line On Thu, 2004-09-23 at 21:43, David Teigland wrote: > On Tue, Sep 21, 2004 at 03:32:30PM -0700, Steven Dake wrote: > > Patrick, > > > > I hvae read your RFC for an API and find it interesting. But there is > > one aspect that is somewhat disturbing to me. > > > > In the model you propose messaging and membership are seperated (or > > atleast not completed). > > What's to prevent a single integrated messaging/membership system (like > you describe below) from providing both messaging and membership ops? > > I /don't/ think the separation of the ops into different structs was meant > to imply that different systems would provide them. I think the intention > was that a clustering module would be free to provide whichever methods it > wanted, e.g. a clustering module that didn't have a quorum system would > just leave those functions null, or provide just selected functions within > a given struct. > if its possible people will do it.. This leads to the problem that then the APIs cannot be trusted to deliver certain guarantees such as agreed ordering of virtual synchrony. If the APIs cannot be trusted to deliver, for example, agreed ordering, then nobody will use agreed ordering and we will have a mess of two-phase commit protocols on our hands.. Or worse, systems will not operate correctly in a distrubted fashion. > > > As a result, I propose we use virtual synchrony as the basis of kernel > > communication. To that end, I have developed a small API (which is > > implemented in userland in about 5000 lines of code). This may be the > > basis, with whatever changes are required for kernel inclusion, for > > communication. > > Sounds good. It would actually be an integrated communication and basic > membership system, right? As you mentioned above, the two are > interdependent. By "basic membership" I'm implying that more exotic > membership systems could be implemented above this lowest layer. > virtual synchrony requires membership and messaging to be integrated to deliver on its model. I can't think of more exotic membership systems except perhaps intergroup. Intergroup doesn't address many of the common problems in distributed computing that are solved by virtual synchrony and there is no open source implementation available and unlikely to be one. Now I suppose someone could come up with something, but those systems should easily be compatible with the virtual synchrony API, as long as they support virtual synchrony requirements. > I think the question here is whether your messaging/membership system > (currently in user space) would fit behind the API Patrick sent once > ported to the kernel. If not, then what needs to be changed so it would? > The idea is for the API to be general enough to support a variety of > clustering modules, including yours. Virtual synchrony is the "one true model" for distributed computing. Other systems just don't deliver the features that are available in virtual synchrony. This allows us the freedom to design any sort of distributed system if we accept virtual synchrony must exist at the lowest level. If virtual synchrony is not enforced by the API, then people that don't care about virtual synchrony immediately could provide implementations that don't support those features. This would result in fragmented implementations of clustering infrastructure which is what we are trying to avoid. Not only that, these solutions would not be reliable in partitions, merges, or faults because they would most likely not handle these situations in a deterministic and correct fashion. IMHO, it is impossible to make a reliable distributed system if partitions, merges, and faults are not addressed up front as part of the APIs and protocols. Thanks -steve