Re: ZkFarmer

Jeff Darcy <jdarcy@xxxxxxxxxx> · Tue, 08 May 2012 08:57:31 -0400

On 05/08/2012 12:33 AM, Anand Babu Periasamy wrote:
> Real issue is here is: GlusterFS is a fully distributed system. It is
> OK for config files to be in one place (centralized). It is easier to
> manage and backup. Avati still claims that making distributed copies
> are not a problem (volume operations are fast, versioned and
> checksumed).

It's also grossly inefficient at 100-node scale.  I'll also need some
convincing before I believe that nodes which are down during a config change
will catch up automatically and reliably in all cases.

I think this is even more of an issue with membership than with config data.
All-to-all pings are just not acceptable at 100-node or greater scale.  We need
something better, and more importantly designing cluster membership protocols
is just not a business we should even be in.  We shouldn't be devoting our own
time to that when we can just use something designed by people who have that as
their focus.

> Also the code base for replicating 3 way or all-node is
> same. We all need to come to agreement on the demerits of replicating
> the volume spec on every node.

It's somewhat similar to how we replicate data - we need enough copies to
survive a certain number of anticipated failures.

> If we are convinced to keep the config info in one place, ZK is
> certainly one a good idea. I personally hate Java dependency. I still
> struggle with Java dependencies for browser and clojure. I can digest
> that if we are going to adopt Java over Python for future external
> modules. Alternatively we can also look at creating a replicated meta
> system volume. What ever we adopt, we should keep dependencies and
> installation steps to the bare minimum and simple.

I personally hate the Java dependency too.  I'd much rather have something in
C/Go/Python/Erlang but couldn't find anything that had the same (useful)
feature set.  I also considered the idea of storing config in a hand-crafted
GlusterFS volume, using our own mechanisms for distributing/finding and
replicating data.  That's at least an area where we can claim some expertise.
Such layering does create a few interesting issues, but nothing intractable.
The big drawback is that it only solves the config-data problem; a solution
which combines that with cluster membership is IMO preferable.  The development
drag of having to maintain that functionality ourselves, and hook every new
feature into the not-very-convenient APIs that have predictably resulted, is
considerable.