On Mon, Mar 25, 2013 at 6:07 AM, Jeff Darcy <jdarcy@xxxxxxxxxx> wrote: > On 03/25/2013 05:38 AM, Vidar Hokstad wrote: >> I see a number of complaints about this as some sort of admission of >> failure. > > I wouldn't quite characterize it as failure. It does work, after all. > However, glusterd has kind of reached its limits. Moving it forward has > become increasingly difficult, and it must move forward to support > future scale and features. There's nothing wrong with hand saws and > axes for small jobs, but at a certain point you're going to need a > chainsaw. We're at that point for glusterd IMO. > >> under its care. The best known example of such a coordination service >> is Apache's ZooKeeper[1], but there are others that don't have the >> noxious Java dependency >> >> I'm happy you recognise the issue of Java. I'd see having to drag that >> around as a major barrier. One of the major benefits of glusterfs is the >> simplicity of deployment compared to many alternatives, and that benefit >> would be massively diminished if I needed to deal with a Java dependency. > > Yeah, I think it's a non-starter. It's a shame, really, because the > functionality is good and the people working on ZK are doing a good job. > Nonetheless, I think the Java dependency is a deal killer. For what > it's worth (and this is more to AB's point) I wouldn't favor *any* > solution that requires users to maintain another component. I think > anything we use has to be fully embedded, with low resource needs and > management completely "under the covers" as far as users are concerned. > I don't think that's possible with a big ball of Java like ZK. > >> I like the Gluster on Gluster idea you mention later on. > > I'm a little surprised by the positive reactions to the "Gluster on > Gluster" approach. Even though Kaleb and I considered it for HekaFS, > it's still a bit of a hack. In particular, we'd still have to solve the > problems of keeping that private instance available, restarting daemons > and initiating repair etc. - exactly the problems it's supposed to be > solving for the rest of the system. > >> Apart from >> that, have you considered pulling out the parts of Glusterd that you'd >> like to be able to ditch and try to generalize it and see if there'd be >> any interest in it as a standalone project? Or is too much of what >> you're looking for new functionality that is not already covered by part >> of your current codebase? > > We don't have anything like ZK ephemerals, and we'd need to add inotify > support (or something equivalent) as well. Then again, those features > would then be exposed to users as well, so it might be worth it. Maybe > we should consider how this might be arranged so that parts would be > useful for things other than GlusterFS itself. Thanks for the idea. > >> * Membership: a certain small set of servers (three or more) would be >> manually set up as coordination-service masters, e.g. via "peer probe >> xxx as master"). >> >> Careful here. Again, a big advantage of Gluster to users is to not need >> any "special" servers that require other treatment. I recognise there's >> a bootstrap problem, but to whatever extent possible, at the very least >> try to make this transparent to users (e.g. have the cluster >> automatically make more of the nodes take on coordination-service roles >> if any are lost etc.). > > I'm a little wary of trying to hide this from users. The coordination > servers should be chosen to minimize the risk of correlated failure, and > we currently lack the topological awareness (e.g. which server is in > which rack or attached to which switch) to do that properly. If we just > do something like "first three servers to be configured become > configuration servers" then we run a very high risk of choosing exactly > those servers that are most likely to fail together. :( As long as the > extra configuration is limited to one option on "peer probe" is it > really a problem? > gluster meta-volume + zeromq for notification (pub/sub) will solve our problems largely and still be light weight. In a large scale deployment, it is not a good idea to declare all the servers as coordination servers. Since meta-volume is a regular distributed replicated gluster volume, it can always be expanded later depending on the load and availability requirements. -- -ab Imagination is more important than knowledge --Albert Einstein