Meta-discussion

fvzwieten at vxcompany.com (Fred van Zwieten) · Wed, 2 Jan 2013 13:19:17 +0100

+1 for 2b.

I am in de planning stages for an RHS 2.0 deployement and I too have
suggested a "cookbook" style guide for step-by-step procedures to my RedHat
Solution Architect.

What can I do to have this upped in the prio-list?

Cheers,
Fred

On Wed, Jan 2, 2013 at 12:49 PM, Brian Candler <B.Candler at pobox.com> wrote:

> On Thu, Dec 27, 2012 at 06:53:46PM -0500, John Mark Walker wrote:
> > I invite all sorts of disagreeable comments, and I'm all for public
> > discussion of things - as can be seen in this list's archives.  But, for
> > better or worse, we've chosen the approach that we have.  Anyone who
> would
> > like to challenge that approach is welcome to take up that discussion
> with
> > our developers on gluster-devel.  This list is for those who need help
> > using glusterfs.
> >
> > I am sorry that you haven't been able to deploy glusterfs in production.
> > Discussing how and why glusterfs works - or doesn't work - for particular
> > use cases is welcome on this list.  Starting off a discussion about how
> > the entire approach is unworkable is kind of counter-productive and not
> > exactly helpful to those of us who just want to use the thing.
>
> For me, the biggest problems with glusterfs are not in its design, feature
> set or performance; they are around what happens when something goes wrong.
> As I perceive them, the issues are:
>
> 1. An almost total lack of error reporting, beyond incomprehensible entries
> in log files on a completely different machine, made very difficult to find
> because they are mixed in with all sorts of other incomprehensible log
> entries.
>
> 2. Incomplete documentation. This breaks down further as:
>
> 2a. A total lack of architecture and implementation documentation - such as
> what the translators are and how they work internally, what a GFID is, what
> xattrs are stored where and what they mean, and all the on-disk states you
> can expect to see during replication and healing.  Without this level of
> documentation, it's impossible to interpret the log messages from (1) short
> of reverse-engineering the source code (which is also very minimalist when
> it comes to comments); and hence it's impossible to reason about what has
> happened when the system is misbehaving, and what would be the correct and
> safe intervention to make.
>
> glusterfs 2.x actually had fairly comprehensive internals documentation,
> but
> this has all been stripped out in 3.x to turn it into a "black box".
> Conversely, development on 3.x has diverged enough from 2.x to make the 2.x
> documentation unusable.
>
> 2b. An almost total lack of procedural documentation, such as "to replace a
> failed server with another one, follow these steps" (which in that case
> involves manually copying peer UUIDs from one server to another), or "if
> volume rebalance gets stuck, do this".  When you come across any of these
> issues you end up asking the list, and to be fair the list generally
> responds promptly and helpfully - but that approach doesn't scale, and
> doesn't necessarily help if you have a storage problem at 3am.
>
> For these reasons, I am holding back from deploying any of the more
> interesting features of glusterfs, such as replicated volumes and
> distributed volumes which might grow and need rebalancing.  And without
> those, I may as well go back to standard NFS and rsync.
>
> And yes, I have raised a number of bug reports for specific issues, but
> reporting a bug whenever you come across a problem in testing or production
> is not the right answer.  It seems to me that all these edge and error
> cases
> and recovery procedures should already have been developed and tested *as a
> matter of course*, for a service as critical as storage.
>
> I'm not saying there is no error handling in glusterfs, because that's
> clearly not true.  What I'm saying is that any complex system is bound to
> have states where processes cannot proceed without external assistance, and
> these cases all need to be tested, and you need to have good error
> reporting
> and good documentation.
>
> I know I'm not the only person to have been affected, because there is a
> steady stream of people on this list who are asking for help with how to
> cope with replication and rebalancing failures.
>
> Please don't consider the above as non-constructive. I count myself amongst
> "those of us who just want to use the thing".  But right now, I cannot
> wholeheartedly recommend it to my colleagues, because I cannot confidently
> say that I or they would be able to handle the failure scenarios I have
> already experienced, or other ones which may occur in the future.
>
> Regards,
>
> Brian.
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://supercolony.gluster.org/mailman/listinfo/gluster-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130102/5794c9ae/attachment.html>