I've long felt that our ways of dealing with cluster membership and staging of config changes is not quite as robust and scalable as we might want. Accordingly, I spent a bit of time a couple of weeks ago looking into the possibility of using ZooKeeper to do some of this stuff. Yeah, it brings in a heavy Java dependency, but when I looked at some lighter-weight alternatives they all seemed to be lacking in more important ways. Basically the idea was to do this: * Set up the first N (e.g. N=3) nodes in our cluster as ZooKeeper servers, or point everyone at an existing ZooKeeper cluster. * Use ZK ephemeral nodes as a way to track cluster membership ("peer probe" merely updates ZK, and "peer status" merely reads from it). * Store config information in ZK *once* instead of regenerating volfiles etc. on every node (and dealing with the ugly cases where a node was down when the config change happened). * Set watches on ZK nodes to be notified when config changes happen, and respond appropriately. I eventually ran out of time and moved on to other things, but this or something like it (e.g. using Riak Core) still seems like a better approach than what we have. In that context, it looks like ZkFarmer[1] might be a big help. AFAICT someone else was trying to solve almost exactly the same kind of server/config problem that we have, and wrapped their solution into a library. Is this a direction other devs might be interested in pursuing some day, if/when time allows? [1] https://github.com/rs/zkfarmer