[Linux-cluster] Interfacing csnap to cluster stack

Daniel Phillips <phillips@xxxxxxxxxx> · Tue, 5 Oct 2004 14:03:50 -0400

Hi all,

The time has arrived to connect the cluster snapshot block device to the 
cluster infrastructure, so that failover will work as God intended.  Ben 
and I have been pondering just how to go about this, using the various 
bits and pieces available, and perhaps evolving some more bits as we 
go.

The cluster snapshot devices interfaces to a user space daemon called 
csnap-agent, whose main job is to receive server connection requests 
and deliver server connections to the driver.  The plan was, we will 
customize the csnap agent as necessary to interface to the cluster 
infrastructure.  So, how, exactly?

The idea is, there is a service manager out there somewhere that keeps 
track of how many instances of a service of a given type currently 
exist, and has some way of creating new resource instances if needed, 
or killing off extra ones.  In our case, we want exactly one instance 
of a csnap server.  We need not only to specify that constraint somehow 
and get a  connection to it, but we need to supply a method of starting 
a csnap server.  So csnap-agent will be a client of service manager and 
an agent of resource manager.

We won't talk to either service manager or resource manager directly, 
but go through Lon's Magma library, which is supposed to provide a nice 
stable api for us to work with, regardless of whether particular 
services reside in kernel or user space, or are local or remote.  Lon 
has said that he will adapt the Magma api as we go, if we break 
anything or run into limitations.  (I suppose that is why it is called 
Magma, it flows under pressure.)

Magma receives requests by direct library calls and supplies answers 
either via function returns or via events delivered over a socket 
connection, which seems to be a pretty good fit with the way csnap does 
things.  So now, what are we going to ask it, and how is it going to 
answer?

  1. Request a snapshot server host:port name, creating an instance
     if necessary

  2. Register to act as an agent to start a snapshot server instance

My instinct is that we do not want 1. to be a blocking call into Magma, 
that returns only when it has a server instance, because we may want 
our agent to be able to service other events while it waits for its 
server address.  So the likely interface is to call magma, saying what 
kind of server we want, and wait for the address to arrive as an event.

Magma doesn't actually know anything about what we're asking it, it only 
knows how to pass on requests to somebody who does.  So we're actually 
talking to service manager and resource manager through Magma, and 
presumably they talk to each other as well, because service manager 
must ask resource manager to create or kill off resource instances on 
its behalf.

Anyway, csnap-agent is mainly going to be talking to service manager 
through Magma, but it also needs to tell resource manager about our 
resource, its constraints and how to set itself up as an agent to 
create it.  I don't have a clear picture of how this works at the 
moment, and that is the point of this email.

For example, how do we specify the service manager constraints, i.e., 
"exactly one" in this case: before we request the instance, or as part 
of the request, or in a configuration file somewhere?

Regards,

Daniel