Re: [Linux-cluster] Interfacing csnap to cluster stack

Lon Hohberger <lhh@xxxxxxxxxx> · Fri, 08 Oct 2004 15:13:20 -0400

On Thu, 2004-10-07 at 21:42 -0400, Daniel Phillips wrote:

>     - There may never be more than one, or the snapshot metadata will
>       be corrupted (this sounds like a good job for gdlm: let the
>       server take an exclusive lock on the snapshot store).

You mean one "master", right?  I thought you want the ability to have
multiple csnap servers, which could handle misc requests, but only one
'master' - that is, only one actually handling client I/O requests and
writing to disk.

Anyway, rgmanager and/or a cluster lock can do the "one and only one
master" bit, I think, as long as the csnap server is trustworthy.

>     - Server instance requests come from csnap agents, one per node.
>       The reply to an instance request is always a server address and
>       port, whether the server had to be instantiated or was already
>       running.

Ok, so the csnap agents get instance requests which tell the server
port, and instantiate a master server if necessary?  (Um, why are we not
using a floating IP address?)

If operating within the context of rgmanager (or another RM), it's
probably a good idea to never have the csnap agent directly instantiate
a master server as a result of a csnap client request, for a couple of
reasons:

- Rgmanager is expected to try to fix it if it breaks, and move it
around if it can't fix it (via the resource agent).

- If another master server mysteriously appears as a result of a client
request while a RM is trying to recover the first incarnation, you'll
end up with a split brain: csnap-agents/servers/clients think one thing,
the RM thinks another.  Very bad.

- Administrators must be able to disable the csnap master server, and
having the agent activate/spawn a new master if one isn't running is
contrary to this assertion.

- Failures are rare.  The case where a server has died and a new
instance request comes in before the RM has had a chance to spawn a new
master server is therefore also rare.  The client can wait until the RM
fixes the existing master or starts a new master server elsewhere in the
cluster.

>     - If the resource manager determines no server is running, then
>       it must instantiate one, by picking one of the cluster nodes,
>       finding the csnap agent on it, and requesting that the agent
>       start a server.

Sure.

>     - When instantiated in a failover path, the local part of the
>       failover path must restrict itself to bounded  memory use.
>       Only a limited set of syscalls may be used in the entire
>       failover path, and all must be known.  Accessing a host
>       filesystem is pretty much out of the question, as is
>       on-demand library or plugin loading.  If anything like this
>       is required, it must be done at initialization time, not
>       during failover.

Were you referring to the case where the csnap master server has failed
(but not anything else on the node, so the cluster is still fine) and it
must first be cleaned up (via the csnap server agent) prior to being
relocated.  However, there is such a low amount of memory available that
we can't get enough juice to tell the csnap server agent to stop it? 

Hmm... <ponders>

The start path is a bit less interesting; if we fail to become a master
(for any reason), we try another node.  The reasons (low memory, lost
disk connection, lunar surge) don't really matter.  It's not
uninteresting, particularly if all of the nodes of the cluster are under
huge memory strain (though, most server machines are never expected to
operate continually at that capacity!).

>     - If a snapshot client disconnects, the server needs to know if
>       it is coming back or has left the cluster, so that it can
>       decide whether to release the client's read locks.

A membership change ought to be able to tell you this much.  If the
csnap master server sees that a node is out of the cluster, that client
isn't coming back (We hope?).

>     - If a server fails over, the new incarnation needs to know
>       that all snapshot clients of the former incarnation have
>       either reconnected or left the cluster.

If the clients are part of the csnap master server's state, perhaps that
list of clients ought to be moved with the rest of the csnap data.

Storing the client list along with the csnap data on shared storage has
the added benefit of surviving total cluster outage, not just a mere
failover.  Not sure if this is interesting or not.

-- Lon