Re: [Linux-cluster] Interfacing csnap to cluster stack

Daniel McNeil <daniel@xxxxxxxx> · Wed, 06 Oct 2004 13:34:12 -0700

On Wed, 2004-10-06 at 11:27, Daniel Phillips wrote:
> On Tuesday 05 October 2004 18:47, Daniel McNeil wrote:
> > > The idea is, there is a service manager out there somewhere that
> > > keeps track of how many instances of a service of a given type
> > > currently exist, and has some way of creating new resource
> > > instances if needed, or killing off extra ones.  In our case, we
> > > want exactly one instance of a csnap server.  We need not only to
> > > specify that constraint somehow and get a  connection to it, but we
> > > need to supply a method of starting a csnap server.  So csnap-agent
> > > will be a client of service manager and an agent of resource
> > > manager.
> >
> > Why do you need a service manager for this?  As Lon suggested,
> > a DLM lock can provided the 1 master and the others ready
> > to take over when the lock is released.
> 
> The DLM uses the service manager.  Why lather on another layer, when 
> really we just want to use the service manager too?
> 

The DLM is a well known interface that has had many implementations.
When Patrick sent out the Generic Kernel API it included membership
and quorum interfaces which is also things that have/could have many
implementations.  The service manager is something new that I have
not seen in other cluster implementations.  Are you planning on
doing a generic API for service manager as well?  From my previous
experience with other cluster implementations, the DLM was only
dependent on membership and quorum (and cluster-wide communication).
>From my perspective the service manager is the other layer. :)
If you make csnap depend on the service manager, then any other
cluster implementation that wanted to use csnap would have to provide
the service manager functionality.

> > > We won't talk to either service manager or resource manager
> > > directly, but go through Lon's Magma library, which is supposed to
> > > provide a nice stable api for us to work with, regardless of
> > > whether particular services reside in kernel or user space, or are
> > > local or remote.  Lon has said that he will adapt the Magma api as
> > > we go, if we break anything or run into limitations.  (I suppose
> > > that is why it is called Magma, it flows under pressure.)
> >
> > Why do we want to use Magma?  At the cluster summit I thought
> > that Magma was just the way to provide backward compatibility
> > for the older GFS releases.  Did we agree to make magma the
> > API?  Having csnap depend on the DLM API makes more sense to me.
> 
> Have you looked at the dlm api?  Why would we want to be directly 
> ioctling sockets when we could be using a library interface?  I'm not 
> necessarily disagreeing with you, the question is: should we be using a 
> library for this or not?  I'd think that using a library is motherhood, 
> though it does force us to think about the api a little harder.

I've looked at libdlm.h and libdlm.so.  It looks like it is the
library that provides dlm_lock(), dlm_unlock() and friends.
I have not reviewed all the dlm calls, but it looks about right.
What am I missing?  I didn't see any direct ioctls.

> 
> > > Magma doesn't actually know anything about what we're asking it, it
> > > only knows how to pass on requests to somebody who does.  So we're
> > > actually talking to service manager and resource manager through
> > > Magma, and presumably they talk to each other as well, because
> > > service manager must ask resource manager to create or kill off
> > > resource instances on its behalf.
> >
> > What would need to be killed off?  Under what circumstances?
> 
> If the cluster shrinks,the resource manager might decide that the 
> population of a particular sort of server is too high and some should 
> be culled.  Of course, having too many servers is less of a problem 
> than having too few, but I generally dislike "grow only" systems of any 
> ilk.
> 

I agree with this usage for resource managers in general, but this
does not seem to apply to the csnap server.

> > > Anyway, csnap-agent is mainly going to be talking to service
> > > manager through Magma, but it also needs to tell resource manager
> > > about our resource, its constraints and how to set itself up as an
> > > agent to create it.  I don't have a clear picture of how this works
> > > at the moment, and that is the point of this email.
> > >
> > > For example, how do we specify the service manager constraints,
> > > i.e., "exactly one" in this case: before we request the instance,
> > > or as part of the request, or in a configuration file somewhere?
> >
> > The cnap-agent to csnap-server seems like a perfect example of why we
> > a cluster communication API.  The csnap-agent wants to send
> > information to the csnap-server and could use a highly available
> > communication mechanism.
> 
> A csnap agent never sends information to a csnap server, except to start 
> one locally at the request of a resource agent.
> 
> There may a good use for a virtual synchrony-based cluster communication 
> api somewhere in this, but that's not it.

I just starting reading through your cluster.snapshot.design.html.
I was talking about the csnap client to csnap server communication.
I did a  quick search through the design doc and don't see what the
csnap-agent is for.  I'll keep reading.

Daniel