Re: Glusterd: A New Hope

Jeff Darcy <jdarcy@xxxxxxxxxx> · Mon, 25 Mar 2013 09:07:11 -0400

On 03/25/2013 05:38 AM, Vidar Hokstad wrote:
> I see a number of complaints about this as some sort of admission of
> failure.

I wouldn't quite characterize it as failure.  It does work, after all.
However, glusterd has kind of reached its limits.  Moving it forward has
become increasingly difficult, and it must move forward to support
future scale and features.  There's nothing wrong with hand saws and
axes for small jobs, but at a certain point you're going to need a
chainsaw.  We're at that point for glusterd IMO.

>     under its care.  The best known example of such a coordination service
>     is Apache's ZooKeeper[1], but there are others that don't have the
>     noxious Java dependency
> 
> I'm happy you recognise the issue of Java. I'd see having to drag that
> around as a major barrier. One of the major benefits of glusterfs is the
> simplicity of deployment compared to many alternatives, and that benefit
> would be massively diminished if I needed to deal with a Java dependency.

Yeah, I think it's a non-starter.  It's a shame, really, because the
functionality is good and the people working on ZK are doing a good job.
 Nonetheless, I think the Java dependency is a deal killer.  For what
it's worth (and this is more to AB's point) I wouldn't favor *any*
solution that requires users to maintain another component.  I think
anything we use has to be fully embedded, with low resource needs and
management completely "under the covers" as far as users are concerned.
 I don't think that's possible with a big ball of Java like ZK.

> I like the Gluster on Gluster idea you mention later on.

I'm a little surprised by the positive reactions to the "Gluster on
Gluster" approach.  Even though Kaleb and I considered it for HekaFS,
it's still a bit of a hack.  In particular, we'd still have to solve the
problems of keeping that private instance available, restarting daemons
and initiating repair etc. - exactly the problems it's supposed to be
solving for the rest of the system.

> Apart from
> that, have you considered pulling out the parts of Glusterd that you'd
> like to be able to ditch and try to generalize it and see if there'd be
> any interest in it as a standalone project? Or is too much of what
> you're looking for new functionality that is not already covered by part
> of your current codebase?

We don't have anything like ZK ephemerals, and we'd need to add inotify
support (or something equivalent) as well.  Then again, those features
would then be exposed to users as well, so it might be worth it.  Maybe
we should consider how this might be arranged so that parts would be
useful for things other than GlusterFS itself.  Thanks for the idea.

>     * Membership: a certain small set of servers (three or more) would be
>     manually set up as coordination-service masters, e.g. via "peer probe
>     xxx as master").
> 
> Careful here. Again, a big advantage of Gluster to users is to not need
> any "special" servers that require other treatment. I recognise there's
> a  bootstrap problem, but to whatever extent possible, at the very least
> try to make this transparent to users (e.g. have the cluster
> automatically make more of the nodes take on coordination-service roles
> if any are lost etc.). 

I'm a little wary of trying to hide this from users.  The coordination
servers should be chosen to minimize the risk of correlated failure, and
we currently lack the topological awareness (e.g. which server is in
which rack or attached to which switch) to do that properly.  If we just
do something like "first three servers to be configured become
configuration servers" then we run a very high risk of choosing exactly
those servers that are most likely to fail together.  :(  As long as the
extra configuration is limited to one option on "peer probe" is it
really a problem?