> We have been thinking of many approaches to address some of Glusterd's > correctness (during failures and at scale) and scalability concerns. A > recent email thread on Glusterd-2.0 was along these lines. While that > discussion is still valid, we have been considering dogfooding as a > viable option to solve our problems. This is not the first time this > has been mentioned but for various reasons didn't really take off. > The following proposal solves Glusterd's requirement for a distributed > (consistent) store using a GlusterFS volume. Then who manages that > GlusterFS volume? To find answers for that and more read further. The main issue I have with this, and why I didn't suggest it myself, is that it creates a bit of a "chicken and egg" problem. Any kind of server-side replication, such as NSR, depends on this subsystem to elect leaders and store its own metadata. How will these things be done if we create a dependency in the other direction? Even AFR has a dependency to manage self-heal daemons, so it's not immune either. Note also that the brick daemons for the MV won't be able to rely on glusterd the same way that current brick daemons do. I think breaking the dependency cycle is very likely to involve the creation of a dependency-free component exactly like what the MV is supposed to avoid. To be sure, maintaining external daemons such as etcd or consul creates its own problems. I think the ideal might be to embed a consensus protocol implementation (Paxos, Raft, or Viewstamped Replication) directly into glusterd, so it's guaranteed to start up and die exactly when those daemons do and be subject to the same permission or resource limits. I'm not sure it's even more work than managing either an external daemon or a management volume (with its own daemons). > - MV would benefit from client-side quorum, server-side quorum and > other options. These could be preset (packaged) as part of > glusterd.vol too. Server-side quorum will probably run into the same circular dependency problem as mentioned above. > ###Changes in glusterd command execution > > Each peer modifies its configuration in /var/lib/glusterd in the > commit phase of every command execution. With the introduction of MV, > the peer in which the command is executed will perform the > modifications to the configuration in /var/lib/glusterd after commit > phase on the remaining available peers. Note, the other nodes don't > perform any updates to MV. We'll probably need to design some sort of multi-file locking protocol on top of the POSIX single-file semantics. That's OK, because pretty much any other alternative will require something similar even if data is in keys instead of files. Also, how does notification of change happen? "Watch" functionality is standard across things like etcd/consul/ZK, and could be extremely handy to get away from relying on glusterd's ad-hoc state machine to manage notification phases, but the only way the MV could support this would be to add inotify support (or something like it). _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel