Good morning Ben, Just a few final thoughts for ahem, tomorrow... I can see that everybody really wants to go for the "full fat" solution (as Patrick would say) right away, without trying anything simple first, so I will go with the flow. After all, I also like skydiving. I feel that a service group is the correct place to hook into cman, mainly because not all nodes in the cluster need have agents running on them, for various reasons including a csnap virtual device being exported to another node, therefore needing neither agent nor server. There isn't a lot of recovery to do in our service group, but it's nice to know the mechanism is there should we need it. Finally, I feel that the service group will be able to help with orderly shutdown as I mentioned earlier. I'm getting attached to the idea of teaching cman to hand out port numbers along with service groups. This needs to be kicked around, but it will eliminate considerable configuration annoyance, and perhaps most of the "well known" cluster ports, which is cruft that cman really shouldn't encode into its header files if we can avoid it. Since we can get cluster ports automagically, we can afford to have a separate service group for each metadevice, i.e., per snapshot store to be used by the cluster. The name of the service group will be formed from a unique id stored in the snapshot metadata at mksnapstore time, prepended with csnap/ or something like that. Multiple service groups should not be scary from the recovery point of view because csnap recovers quickly from membership changes as described earlier specifically for servers, which is the only interesting case. Each agent will bind to the service group's cluster port, which enforces a one agent per node per metadevice rule. Though we don't have to do it right away, a single agent could handle multiple metadevices (i.e., snapshot stores) using the local socket name to know which clients to connect to which servers. The agent currently uses only one local socket, so could support only a single metadevice, but we can use multiple agents to support multiple metadevices, which doesn't violate the rule above. Hmm, alternatively we could use the name of the snapshot store device, which is locally known by both the dm target and the server, to match up servers to clients. Then the agent would not have to bind to multiple sockets and we would not have to create a way of feeding it new socket names. This might be better, though it would mean that we have to make rules about aliasing, one obvious form of which is the device mapper uuid if the snapshot store is a device mapper device. The server will be modified to read the snapshot store superblock before settling down to act as a standby. It passes the metadevice unique id to the agent so that the agent can join the service group and bind to the correct cluster port. An agent sends cluster messages by resolving node:service to cluster_address:port. I think the node id might even be the cluster address, I'm not looking at the code right now. The rest is as described in the original post, these are really just more details to consider. Regards, Daniel