On 05/08/2012 12:33 AM, Anand Babu Periasamy wrote: > Real issue is here is: GlusterFS is a fully distributed system. It is > OK for config files to be in one place (centralized). It is easier to > manage and backup. Avati still claims that making distributed copies > are not a problem (volume operations are fast, versioned and > checksumed). It's also grossly inefficient at 100-node scale. I'll also need some convincing before I believe that nodes which are down during a config change will catch up automatically and reliably in all cases. I think this is even more of an issue with membership than with config data. All-to-all pings are just not acceptable at 100-node or greater scale. We need something better, and more importantly designing cluster membership protocols is just not a business we should even be in. We shouldn't be devoting our own time to that when we can just use something designed by people who have that as their focus. > Also the code base for replicating 3 way or all-node is > same. We all need to come to agreement on the demerits of replicating > the volume spec on every node. It's somewhat similar to how we replicate data - we need enough copies to survive a certain number of anticipated failures. > If we are convinced to keep the config info in one place, ZK is > certainly one a good idea. I personally hate Java dependency. I still > struggle with Java dependencies for browser and clojure. I can digest > that if we are going to adopt Java over Python for future external > modules. Alternatively we can also look at creating a replicated meta > system volume. What ever we adopt, we should keep dependencies and > installation steps to the bare minimum and simple. I personally hate the Java dependency too. I'd much rather have something in C/Go/Python/Erlang but couldn't find anything that had the same (useful) feature set. I also considered the idea of storing config in a hand-crafted GlusterFS volume, using our own mechanisms for distributing/finding and replicating data. That's at least an area where we can claim some expertise. Such layering does create a few interesting issues, but nothing intractable. The big drawback is that it only solves the config-data problem; a solution which combines that with cluster membership is IMO preferable. The development drag of having to maintain that functionality ourselves, and hook every new feature into the not-very-convenient APIs that have predictably resulted, is considerable.