[Linux-cluster] Manual cluster snapshot server failover

Daniel Phillips <phillips@xxxxxxxxxx> · Thu, 7 Oct 2004 16:20:20 -0400

Hi all,

In the latest CVS cluster snapshot, the example agent.c is amended to 
provide a "manual" means of testing server failover.  Now, when the 
agent receives a server connection request from a device mapper target, 
it attempts to establish a connection and leaves the client on a list 
if that fails.

A little utility, sendagent, was written to send the name and port 
number of a new snapshot server to the agent over the same local socket 
that the device mapper targets use.  For each of the waiting clients, 
the agent attempts to connect to the server, and if successful, passes 
the connection to the client, which resumes processing IO requests. 

Here is an example test scenario:

   # Run a standard test that starts a csnap server, creates a snapshot
   # device, and runs some IO on it.
   # The test assumes devices /dev/test-origin and /dev/test-snapstore,
   # they can be symlinks to partitions, devices or files.
   # Port 8080 may have to be changed to something else if it is in use
   # on your test machine.
   make test

   # Manual failure.  IO on the virtual device will hang.
   killall csnap-server

   # Tell the agent to attempt reconnection, this fail (no server)
   ./sendagent @testdev-control localhost:8080

   # Start a new snapshot server 
   ./csnap-server /dev/test-origin /dev/test-snapstore 9090

   # Tell the agent about it.  IO on the virtual device resumes.
   ./sendagent @testdev-control localhost:9090

   # Check it by writing a pattern of 77's to the device
   ./devspam /dev/mapper/testdev write 1 77

All the bits and pieces are now in place for running the cluster 
snapshot on a cluster, except:

  1) There is some automagic resource instantiation missing as we have
     discussed.

  2) The server lacks an interface with cluster membership, which it
     requires for the optional 3-message style of snapshot client
     interface.

Since Ben is looking at resource instantiation issues, I'll look at 
cluster membersip next.

Regards,

Daniel