Re: [Linux-cluster] RE: [RFC] Generic Kernel API

David Teigland <teigland@xxxxxxxxxx> · Wed, 6 Oct 2004 14:07:40 +0800

On Tue, Oct 05, 2004 at 02:39:44PM -0700, Steven Dake wrote:

> if its possible people will do it..  

I think that's the point -- you can if you want.

> This leads to the problem that then the APIs cannot be trusted to
> deliver certain guarantees such as agreed ordering of virtual synchrony.
> If the APIs cannot be trusted to deliver, for example, agreed ordering,
> then nobody will use agreed ordering and we will have a mess of
> two-phase commit protocols on our hands..  Or worse, systems will not
> operate correctly in a distrubted fashion.

Here's the point of this whole exercise:  to allow /multiple/ varieties of
cluster manager to live behind one API.  Some cm's, like yours, would take
the virtual synchrony approach with everything that implies.  Other cm's
may be intended for something else, not require the guarantees yours
provides, and take a different course.

I thought the whole motivation was to allow for many cm implementations,
each with their own unique characteristics, but all exporting their
function through a common kernel API.  (Although I wasn't there, I heard
"everyone else" agreed this was necessary.)

If there's only one kernel cm (one that uses VS as you suggest), then
there's no point in pursuing this "common API" idea -- there would just be
"the API" exported by "the cm".  This explains why we seem completely out
of sync in this discussion.

You obviously don't see any need for different cm implementations (and
think it's a bad idea.)  In theory you may be right, but I think this is
mainly a practical question right now.  Someone else could probably
produce some technical reasons why your CM isn't what they want --
clustering is a pretty broad field and saying there's "only one right way"
is a bit bold.  (In theory we only need one local file system after all.)
In fact, I've thought there may be enough variation among cm's that
sharing an API would be impossible -- I'm still not sure.

With cm's providing different functions and behavior, an "application"
would obviously need to select a specific cm by name (each implementation
has a unique name) to attach to and use.

> virtual synchrony requires membership and messaging to be integrated to
> deliver on its model.  I can't think of more exotic membership systems
> except perhaps intergroup.

Here's an example:  the lowest level cm provides basic
membership/messaging.  It considers any node a member as long as it can
communicate with it.  Now say there's a higher level membership system
built above this that has a more restrictive policy on who can be a
member.  It takes the membership info from the lower level cm, removes the
members that don't meet its criteria and exports that new list as the
members.  An application would have to be written, of course, to interface
with one of the two cm's depending on what it needs.

[This concept of layering additional features is one aspect of the common
API idea that I'm not emphasizing as much as the basic concept of
alternative implemenatations.]

> > I think the question here is whether your messaging/membership system
> > (currently in user space) would fit behind the API Patrick sent once
> > ported to the kernel.  If not, then what needs to be changed so it would?
> > The idea is for the API to be general enough to support a variety of
> > clustering modules, including yours.
> 
> Virtual synchrony is the "one true model" for distributed computing. 

That may be, and if it's true I don't imagine any other cm's will exist
behind this API in the long term.  The question was, will this API
adequately export whatever your cm provides?

> Other systems just don't deliver the features that are available in
> virtual synchrony.  

They may not deliver those features by choice simply because the features
aren't necessary for what they're designed to do.

> This allows us the freedom to design any sort of distributed system if
> we accept virtual synchrony must exist at the lowest level.

We're aiming for even more freedom -- the ability to reject even a
VS-based cm.  If that's a foolish idea, then alternatives will either die
or never sprout up.  From what I've heard, there's not a consensus on one
true cm everyone will adopt.  I think it's unlikely to happen any time
soon which means we need to allow for different approaches.

> If virtual synchrony is not enforced by the API, then people that don't
> care about virtual synchrony immediately could provide implementations
> that don't support those features.  This would result in fragmented
> implementations of clustering infrastructure which is what we are trying
> to avoid.

As I said earlier, I thought the whole point of this was to allow for
fragmentation but to agree on an API if possible.  If that's the case,
then the API should probably be as permissive as possible.

> Not only that, these solutions would not be reliable in partitions,
> merges, or faults because they would most likely not handle these
> situations in a deterministic and correct fashion.  IMHO, it is
> impossible to make a reliable distributed system if partitions, merges,
> and faults are not addressed up front as part of the APIs and protocols.

Some people may not be as interested in reliability as you and I are.

I understand what you're saying and I think we'd like to use pretty much
the same kernel cm in the end.  The common kernel API isn't really about
what /we/ want, though, it's about what other people might want to do.  If
it's possible to share a common API despite a diversity of implementations
that would be nice -- at least that's the basis of this discussion.

Your goal (to get everyone to agree on a single kernel cm) might be
possible, but it'll probably take a bit of work.  Getting everyone to
share a common API would at least be a step in that direction.

-- 
Dave Teigland  <teigland@xxxxxxxxxx>