Re: [Linux-cluster] Interfacing csnap to cluster stack

Daniel Phillips <phillips@xxxxxxxxxx> · Thu, 7 Oct 2004 21:42:19 -0400

On Thursday 07 October 2004 18:57, Daniel McNeil wrote:
> Daniel,
>
> Maybe you should describe what kind of help you are looking for
> from the infrastructure?

Sure, there are two separate problems:

  1) Resource management

    - The resource to be instantiated is the csnap server.

    - There may never be more than one, or the snapshot metadata will
      be corrupted (this sounds like a good job for gdlm: let the
      server take an exclusive lock on the snapshot store).

    - Server instance requests come from csnap agents, one per node.
      The reply to an instance request is always a server address and
      port, whether the server had to be instantiated or was already
      running.

    - If the resource manager determines no server is running, then
      it must instantiate one, by picking one of the cluster nodes,
      finding the csnap agent on it, and requesting that the agent
      start a server.

    - When instantiated in a failover path, the local part of the
      failover path must restrict itself to bounded  memory use.
      Only a limited set of syscalls may be used in the entire
      failover path, and all must be known.  Accessing a host
      filesystem is pretty much out of the question, as is
      on-demand library or plugin loading.  If anything like this
      is required, it must be done at initialization time, not
      during failover.

  2) Membership

    - If a snapshot client disconnects, the server needs to know if
      it is coming back or has left the cluster, so that it can
      decide whether to release the client's read locks.

    - If a server fails over, the new incarnation needs to know
      that all snapshot clients of the former incarnation have
      either reconnected or left the cluster.

    - There exists a snapshot client protocol variation that adds
      an additional message (confirmation of read lock release)
      and allows the snapshot server to ignore cluster membership
      entirely,  This is a way of wimping out instead of dealing
      with interface issues.

    - Origin clients don't present a problem, they don't hold
      locks.

Regards,

Daniel