Re: [Linux-cluster] Interfacing csnap to cluster stack

Benjamin Marzinski <bmarzins@xxxxxxxxxx> · Mon, 11 Oct 2004 10:17:01 -0500

> > > [1] At cluster bring-up time.  The resource manager has to be able
> > > to operate without reading files during failover.
> >
> > Existing CRMs will not do this, at least, not the ones I've looked
> > at. lrmd (new linux-ha stuff) and heartbeat (older) and rgmanager all
> > fork/ exec scripts on the local file system to control resources and
> > applications, implying that none will work with csnap server.
> >
> > D'oh! :(
> 
> This problem is not specific to the csnap server, it's a problem for 
> anything that lies in the failover path and wants resource management.
> You could say that this problem is specific to servers, because they 
> tend to attract a lot of network and disk traffic to themselves.

No. being a block device really does make it a special problem.  You can suck
up a lot more memory that can't be swapped out once the client is below the
VM layer.  When a NFS server goes away, the clients do not deadlock themselves,
but they are above the VM layer.  The other device that could have this problem
is the cluster mirror device, but Jon is taking a completely different
approach.  I think the results will be very instructive.

Here's what I mean. GNBD was developed before we had a resource manager, and
before service manager was stable. GULM doesn't provide all the capabilities
that SM does, so I decided that the GNBD clients would do failure detection
themselves.  This more than doubled the complexity of the gnbd code, and
caused our support department to fevorishly beg that this solution never be
used again.

I am currently working on getting the csnap device to plug into rgmanager for
failover.  This will possibly have some cases where a client could
have detected an error, and rgmanager will miss it.  I don't think they
will be major. It is also a much simpler approach.  On the downside, we've
added a lot of userspace overhead to the failover process.

AFAIK, Jon is currently planning on having the cluster mirror device completely
in kernel, and relying on SM for failure detection.  This is a very lightweight
approach, which I think is about as resistant as the linux kernel will allow
us to be to memory pressure. On the downside, I believe that Service Manager's
only failure detection mechanism is an unexpected socket closing.  This
leaves obvious cases where rgmanager could detect failure and SM will miss it.
It also forces the cluster mirror code to do more self checking than the
csnap server needs to do.

If the cluster mirror is robust enough, it will win hands down on simplicity.
The server will be in kernel, but the design could be used with a userspace
server.  If the design is not robust enough, then some variety of the csnap
design may me the best compromise between robustness and simplicity.

Right now, I wouldn't be willing to bet on one or the other, but I'm interested
in seeing how this plays out.

-Ben

> You could go on to say "but if the writeout path didn't have any servers 
> in it, I wouldn't have to do any resource managemen!"  That's correct 
> except it's a lot easier to get some acceptable form of resource 
> management working than to distribute the on-disk data structures 
> involved.
> 
> I think the biggest part of this problem is just defining what's needed.  
> How hard could it be to implement?
> 
> Oh, is the resource manager to be distributed, or will it be a 
> server? ;-)
> 
> Regards,
> 
> Daniel
> 
> --
> 
> Linux-cluster@xxxxxxxxxx
> http://www.redhat.com/mailman/listinfo/linux-cluster