On 03/25/2013 05:38 AM, Vidar Hokstad wrote: > I see a number of complaints about this as some sort of admission of > failure. I wouldn't quite characterize it as failure. It does work, after all. However, glusterd has kind of reached its limits. Moving it forward has become increasingly difficult, and it must move forward to support future scale and features. There's nothing wrong with hand saws and axes for small jobs, but at a certain point you're going to need a chainsaw. We're at that point for glusterd IMO. > under its care. The best known example of such a coordination service > is Apache's ZooKeeper[1], but there are others that don't have the > noxious Java dependency > > I'm happy you recognise the issue of Java. I'd see having to drag that > around as a major barrier. One of the major benefits of glusterfs is the > simplicity of deployment compared to many alternatives, and that benefit > would be massively diminished if I needed to deal with a Java dependency. Yeah, I think it's a non-starter. It's a shame, really, because the functionality is good and the people working on ZK are doing a good job. Nonetheless, I think the Java dependency is a deal killer. For what it's worth (and this is more to AB's point) I wouldn't favor *any* solution that requires users to maintain another component. I think anything we use has to be fully embedded, with low resource needs and management completely "under the covers" as far as users are concerned. I don't think that's possible with a big ball of Java like ZK. > I like the Gluster on Gluster idea you mention later on. I'm a little surprised by the positive reactions to the "Gluster on Gluster" approach. Even though Kaleb and I considered it for HekaFS, it's still a bit of a hack. In particular, we'd still have to solve the problems of keeping that private instance available, restarting daemons and initiating repair etc. - exactly the problems it's supposed to be solving for the rest of the system. > Apart from > that, have you considered pulling out the parts of Glusterd that you'd > like to be able to ditch and try to generalize it and see if there'd be > any interest in it as a standalone project? Or is too much of what > you're looking for new functionality that is not already covered by part > of your current codebase? We don't have anything like ZK ephemerals, and we'd need to add inotify support (or something equivalent) as well. Then again, those features would then be exposed to users as well, so it might be worth it. Maybe we should consider how this might be arranged so that parts would be useful for things other than GlusterFS itself. Thanks for the idea. > * Membership: a certain small set of servers (three or more) would be > manually set up as coordination-service masters, e.g. via "peer probe > xxx as master"). > > Careful here. Again, a big advantage of Gluster to users is to not need > any "special" servers that require other treatment. I recognise there's > a bootstrap problem, but to whatever extent possible, at the very least > try to make this transparent to users (e.g. have the cluster > automatically make more of the nodes take on coordination-service roles > if any are lost etc.). I'm a little wary of trying to hide this from users. The coordination servers should be chosen to minimize the risk of correlated failure, and we currently lack the topological awareness (e.g. which server is in which rack or attached to which switch) to do that properly. If we just do something like "first three servers to be configured become configuration servers" then we run a very high risk of choosing exactly those servers that are most likely to fail together. :( As long as the extra configuration is limited to one option on "peer probe" is it really a problem?