Hi all, The time has arrived to connect the cluster snapshot block device to the cluster infrastructure, so that failover will work as God intended. Ben and I have been pondering just how to go about this, using the various bits and pieces available, and perhaps evolving some more bits as we go. The cluster snapshot devices interfaces to a user space daemon called csnap-agent, whose main job is to receive server connection requests and deliver server connections to the driver. The plan was, we will customize the csnap agent as necessary to interface to the cluster infrastructure. So, how, exactly? The idea is, there is a service manager out there somewhere that keeps track of how many instances of a service of a given type currently exist, and has some way of creating new resource instances if needed, or killing off extra ones. In our case, we want exactly one instance of a csnap server. We need not only to specify that constraint somehow and get a connection to it, but we need to supply a method of starting a csnap server. So csnap-agent will be a client of service manager and an agent of resource manager. We won't talk to either service manager or resource manager directly, but go through Lon's Magma library, which is supposed to provide a nice stable api for us to work with, regardless of whether particular services reside in kernel or user space, or are local or remote. Lon has said that he will adapt the Magma api as we go, if we break anything or run into limitations. (I suppose that is why it is called Magma, it flows under pressure.) Magma receives requests by direct library calls and supplies answers either via function returns or via events delivered over a socket connection, which seems to be a pretty good fit with the way csnap does things. So now, what are we going to ask it, and how is it going to answer? 1. Request a snapshot server host:port name, creating an instance if necessary 2. Register to act as an agent to start a snapshot server instance My instinct is that we do not want 1. to be a blocking call into Magma, that returns only when it has a server instance, because we may want our agent to be able to service other events while it waits for its server address. So the likely interface is to call magma, saying what kind of server we want, and wait for the address to arrive as an event. Magma doesn't actually know anything about what we're asking it, it only knows how to pass on requests to somebody who does. So we're actually talking to service manager and resource manager through Magma, and presumably they talk to each other as well, because service manager must ask resource manager to create or kill off resource instances on its behalf. Anyway, csnap-agent is mainly going to be talking to service manager through Magma, but it also needs to tell resource manager about our resource, its constraints and how to set itself up as an agent to create it. I don't have a clear picture of how this works at the moment, and that is the point of this email. For example, how do we specify the service manager constraints, i.e., "exactly one" in this case: before we request the instance, or as part of the request, or in a configuration file somewhere? Regards, Daniel