Is there any reason not to consider zookeeper? The 3.4 release is quite stable and due to a large number of users, bugs are fixed and its quirks are known. I like the idea of RAFT. The paper is well written and very compelling. The last time I read it, a number of critical issues were glossed over - for instance, log compaction and pruning. Systems must be correct in both theory and implementation. Although many raft-based sustems have cropped in the last year or so since RAFT was published, I don't judge their use to be significant compared to zookeeper. Quality only comes with maturity and many workloads bashing on it. The last time I needed to build up a new distributed system, I had written up some notes about etcd vs zookeeper. Perhaps you will find them helpful, or motivate some new questions before you make your decision. https://docs.google.com/document/d/1FOnLD26W9iQ2CUZ-jVCn7o0OrX8KPH7QGeokN4tA_j4/edit On Monday, September 8, 2014, Jonathan Barber <jonathan.barber@xxxxxxxxx> wrote: > > On 8 September 2014 05:05, Krishnan Parthasarathi <kparthas@xxxxxxxxxx> wrote: >> >> >> >> > Bulk of current GlusterD code deals with keeping the configuration of the >> > cluster and the volumes in it consistent and available across the nodes. The >> > current algorithm is not scalable (N^2 in no. of nodes) and doesn't prevent >> > split-brain of configuration. This is the problem area we are targeting for >> > the first phase. >> > >> > As part of the first phase, we aim to delegate the distributed configuration >> > store. We are exploring consul [1] as a replacement for the existing >> > distributed configuration store (sum total of /var/lib/glusterd/* across all >> > nodes). Consul provides distributed configuration store which is consistent >> > and partition tolerant. By moving all Gluster related configuration >> > information into consul we could avoid split-brain situations. >> > Did you get a chance to go over the following questions while making the >> > decision? If yes could you please share the info. >> > What are the consistency guarantees for changing the configuration in case of >> > network partitions? >> > specifically when there are 2 nodes and 1 of them is not reachable? >> > consistency guarantees when there are more than 2 nodes? >> > What are the consistency guarantees for reading configuration in case of >> > network partitions? >> >> consul uses Raft[1] distributed consensus algorithm internally for maintaining >> consistency. The Raft consensus algorithm is proven to be correct. I will be >> going through the workings of the algorithm soon. I will share my answers to >> the above questions after that. Thanks for the questions, it is important >> for the user to understand the behaviour of a system especially under failure. >> I am considering adding a FAQ section to this proposal, where questions like the above would >> go, once it gets accepted and makes it to the feature page. >> >> [1] - https://ramcloud.stanford.edu/wiki/download/attachments/11370504/raft.pdf >> > > The following article provides some results on how Consul works following partitioning, actually testing whether it recovers successfully: > http://aphyr.com/posts/316-call-me-maybe-etcd-and-consul > > It gives Consul a positive review. > > HTH > >> ~KP >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > -- > Jonathan Barber <jonathan.barber@xxxxxxxxx> _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users