Thanks to everyone who added to the dialogue here. Obviously, we need to think about how to do this. Perhaps the best way is to start an outline on the wiki - http://www.gluster.org/community/documentation/ We can probably start one in the next week or two, but it would be great if someone here wanted to take the initiative. -JM ----- Original Message ----- > On Wed, 2013-01-02 at 08:38 -0500, Whit Blauvelt wrote: > > There's a strong trend against documentation of software, and not > > just in > > open source. I'm old enough to remember when anything modestly > > complex came > > with hundreds of pages of manuals, often over several volumes. > I agree-- You also have to agree that there are "new" advancements > too: > irc, mailing lists, etc... As one of my personal preferences, I > use/write puppet code. This is useful to me as "de-facto" > documentation > on how to set something up. If there's ever software that I really > don't > understand, but it has a puppet module (even if it's a poorly written > one) reading it can often give me clues as to how the underlying > software works. > > After a hard time learning how gluster works, I made a puppet module > [1] > for exactly this reason. It's definitely a more complicated module > that > does more (some sysadmins don't want it to do this much), however it > *does* show how to get a working gluster setup if you look through > it, > or run it. Conversely, I hope that the real gluster experts out there > check it out and add optimisations to it. What better way to get > users > trying out gluster if they have a turn key, deployment solution > available. > > This wasn't meant as a plug, it is Free Software after all, but > you're > free to use and share it as a means to help new users figure out > gluster > in the face of missing docs. I think this is a good way to learn. > > Hope this was a useful comment, > James > [1] https://github.com/purpleidea/puppet-gluster > > > Now, I can > > understand why commercial software with constrained GUIs wants to > > pretend > > that what's underneath is as simple as the GUI suggests, so as not > > to scare > > away customers. And I can understand why some open source projects > > might > > want to withhold knowledge to motivate consulting contracts, as > > cynical as > > that may be. > > > > But something on the scale of Gluster should have someone hired > > full time to > > do nothing but continuously write and update documentation. If you > > need a > > business model for that, print the results in a set of thick books, > > and sell > > it for $250 or so. Print JIT so you can track point releases. What > > Brian > > asks for should be the core of it. Even when stuff breaks for > > people who > > have paid for their RedHat Solution Architect, it will give that > > architect a > > place to look up the fix quickly, rather than having to go bother > > the > > development team, who are more profitably deployed in development. > > > > Best, > > Whit > > > > > > On Wed, Jan 02, 2013 at 01:19:17PM +0100, Fred van Zwieten wrote: > > > +1 for 2b. > > > > > > I am in de planning stages for an RHS 2.0 deployement and I too > > > have suggested > > > a "cookbook" style guide for step-by-step procedures to my RedHat > > > Solution > > > Architect. > > > > > > What can I do to have this upped in the prio-list? > > > > > > Cheers, > > > Fred > > > > > > > > > On Wed, Jan 2, 2013 at 12:49 PM, Brian Candler > > > <B.Candler at pobox.com> wrote: > > > > > > On Thu, Dec 27, 2012 at 06:53:46PM -0500, John Mark Walker > > > wrote: > > > > I invite all sorts of disagreeable comments, and I'm all > > > > for public > > > > discussion of things - as can be seen in this list's > > > > archives. But, for > > > > better or worse, we've chosen the approach that we have. > > > > Anyone who > > > would > > > > like to challenge that approach is welcome to take up that > > > > discussion > > > with > > > > our developers on gluster-devel. This list is for those > > > > who need help > > > > using glusterfs. > > > > > > > > I am sorry that you haven't been able to deploy glusterfs > > > > in production. > > > > Discussing how and why glusterfs works - or doesn't work - > > > > for particular > > > > use cases is welcome on this list. Starting off a > > > > discussion about how > > > > the entire approach is unworkable is kind of > > > > counter-productive and not > > > > exactly helpful to those of us who just want to use the > > > > thing. > > > > > > For me, the biggest problems with glusterfs are not in its > > > design, feature > > > set or performance; they are around what happens when > > > something goes wrong. > > > As I perceive them, the issues are: > > > > > > 1. An almost total lack of error reporting, beyond > > > incomprehensible entries > > > in log files on a completely different machine, made very > > > difficult to find > > > because they are mixed in with all sorts of other > > > incomprehensible log > > > entries. > > > > > > 2. Incomplete documentation. This breaks down further as: > > > > > > 2a. A total lack of architecture and implementation > > > documentation - such as > > > what the translators are and how they work internally, what a > > > GFID is, what > > > xattrs are stored where and what they mean, and all the > > > on-disk states you > > > can expect to see during replication and healing. Without > > > this level of > > > documentation, it's impossible to interpret the log messages > > > from (1) short > > > of reverse-engineering the source code (which is also very > > > minimalist when > > > it comes to comments); and hence it's impossible to reason > > > about what has > > > happened when the system is misbehaving, and what would be > > > the correct and > > > safe intervention to make. > > > > > > glusterfs 2.x actually had fairly comprehensive internals > > > documentation, > > > but > > > this has all been stripped out in 3.x to turn it into a > > > "black box". > > > Conversely, development on 3.x has diverged enough from 2.x > > > to make the 2.x > > > documentation unusable. > > > > > > 2b. An almost total lack of procedural documentation, such as > > > "to replace a > > > failed server with another one, follow these steps" (which in > > > that case > > > involves manually copying peer UUIDs from one server to > > > another), or "if > > > volume rebalance gets stuck, do this". When you come across > > > any of these > > > issues you end up asking the list, and to be fair the list > > > generally > > > responds promptly and helpfully - but that approach doesn't > > > scale, and > > > doesn't necessarily help if you have a storage problem at > > > 3am. > > > > > > For these reasons, I am holding back from deploying any of > > > the more > > > interesting features of glusterfs, such as replicated volumes > > > and > > > distributed volumes which might grow and need rebalancing. > > > And without > > > those, I may as well go back to standard NFS and rsync. > > > > > > And yes, I have raised a number of bug reports for specific > > > issues, but > > > reporting a bug whenever you come across a problem in testing > > > or production > > > is not the right answer. It seems to me that all these edge > > > and error > > > cases > > > and recovery procedures should already have been developed > > > and tested *as a > > > matter of course*, for a service as critical as storage. > > > > > > I'm not saying there is no error handling in glusterfs, > > > because that's > > > clearly not true. What I'm saying is that any complex system > > > is bound to > > > have states where processes cannot proceed without external > > > assistance, and > > > these cases all need to be tested, and you need to have good > > > error > > > reporting > > > and good documentation. > > > > > > I know I'm not the only person to have been affected, because > > > there is a > > > steady stream of people on this list who are asking for help > > > with how to > > > cope with replication and rebalancing failures. > > > > > > Please don't consider the above as non-constructive. I count > > > myself amongst > > > "those of us who just want to use the thing". But right now, > > > I cannot > > > wholeheartedly recommend it to my colleagues, because I > > > cannot confidently > > > say that I or they would be able to handle the failure > > > scenarios I have > > > already experienced, or other ones which may occur in the > > > future. > > > > > > Regards, > > > > > > Brian. > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users at gluster.org > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users at gluster.org > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users