Re: [Linux-cluster] RE: [RFC] Generic Kernel API

Steven Dake <sdake@xxxxxxxxxx> · Tue, 05 Oct 2004 14:39:44 -0700

David
my apologies for not responding earlier I think I must have missed this
post

comments line

On Thu, 2004-09-23 at 21:43, David Teigland wrote:
> On Tue, Sep 21, 2004 at 03:32:30PM -0700, Steven Dake wrote:
> > Patrick,
> > 
> > I hvae read your RFC for an API and find it interesting.  But there is
> > one aspect that is somewhat disturbing to me.
> > 
> > In the model you propose messaging and membership are seperated (or
> > atleast not completed).
> 
> What's to prevent a single integrated messaging/membership system (like
> you describe below) from providing both messaging and membership ops?
> 
> I /don't/ think the separation of the ops into different structs was meant
> to imply that different systems would provide them.  I think the intention
> was that a clustering module would be free to provide whichever methods it
> wanted, e.g. a clustering module that didn't have a quorum system would
> just leave those functions null, or provide just selected functions within
> a given struct.
> 

if its possible people will do it..  This leads to the problem that then
the APIs cannot be trusted to deliver certain guarantees such as agreed
ordering of virtual synchrony.  If the APIs cannot be trusted to
deliver, for example, agreed ordering, then nobody will use agreed
ordering and we will have a mess of two-phase commit protocols on our
hands..  Or worse, systems will not operate correctly in a distrubted
fashion.

> 
> > As a result, I propose we use virtual synchrony as the basis of kernel
> > communication.  To that end, I have developed a small API (which is
> > implemented in userland in about 5000 lines of code).  This may be the
> > basis, with whatever changes are required for kernel inclusion, for
> > communication.
> 
> Sounds good.  It would actually be an integrated communication and basic
> membership system, right?  As you mentioned above, the two are
> interdependent.  By "basic membership" I'm implying that more exotic
> membership systems could be implemented above this lowest layer.
> 

virtual synchrony requires membership and messaging to be integrated to
deliver on its model.  I can't think of more exotic membership systems
except perhaps intergroup.  Intergroup doesn't address many of the
common problems in distributed computing that are solved by virtual
synchrony and there is no open source implementation available and
unlikely to be one.  Now I suppose someone could come up with something,
but those systems should easily be compatible with the virtual synchrony
API, as long as they support virtual synchrony requirements.

> I think the question here is whether your messaging/membership system
> (currently in user space) would fit behind the API Patrick sent once
> ported to the kernel.  If not, then what needs to be changed so it would?
> The idea is for the API to be general enough to support a variety of
> clustering modules, including yours.

Virtual synchrony is the "one true model" for distributed computing. 
Other systems just don't deliver the features that are available in
virtual synchrony.  This allows us the freedom to design any sort of
distributed system if we accept virtual synchrony must exist at the
lowest level.  If virtual synchrony is not enforced by the API, then
people that don't care about virtual synchrony immediately could provide
implementations that don't support those features.  This would result in
fragmented implementations of clustering infrastructure which is what we
are trying to avoid.  Not only that, these solutions would not be
reliable in partitions, merges, or faults because they would most likely
not handle these situations in a deterministic and correct fashion. 
IMHO, it is impossible to make a reliable distributed system if
partitions, merges, and faults are not addressed up front as part of the
APIs and protocols.

Thanks
-steve