thanks for all responding - at least i know i'm not all alone. i am shocked to think that so many on this list are having serious fundamental issues with glusterfs - and seemingly for a long time. so, without wanting to troll - my question is: "is gluster a serious stable general purpose file system"? or, is it more a good "caching system" for a specific narrow domain? i'd really like to hear from any official gluster people out there - right now the silence is deafening. is this issue know? is it viewed a serious? it is being worked on? i'm all up for volunteering to help by sending in a test case, sending logs - whatever is asked of me. i want to believe gluster is going to work - as do many other sys-admins i know of in the post/film industry. however, i'm rapidly loosing confidence in gluster with each passing day of silence... in hope - paul On Mon, Feb 21, 2011 at 6:47 PM, Joe Landman < landman at scalableinformatics.com> wrote: > On 02/21/2011 01:39 PM, Kon Wilms wrote: > >> On Mon, Feb 21, 2011 at 9:45 AM, Steve Wilson<stevew at purdue.edu> wrote: >> >>> We had trouble with reliability for small, actively-accessed files on a >>> distribute-replicate volume in both GlusterFS 3.11 and 3.12. It seems >>> that >>> the replicated servers would eventually get out of sync with each other >>> on >>> these kinds of files. For a while, we dropped replication and only ran >>> the >>> volume as distributed. This has worked reliably for the past week or so >>> without any errors that we were seeing before: no such file, invalid >>> argument, etc. >>> >> >> I'm running thousands of small files over NFSv3 through NGINX with >> distribute and have had the opposite experience. Unfortunately when >> NGINX can't access a file over NFS it means a customer calling us, so >> right now gluster is basically sitting idle (posted my output to the >> list a while back with no response). >> > > We've had lots of issues with files disappearing or being inaccessible > prior to 3.1.2 with the NFS client and server translator. After 3.1.2, many > of these problems *seem* to have been resolved, though all this means in > this instance is that the customer hasn't submitted a ticket yet. > > I had thought it was originally a timebase issue ... as we had a minute or > two drift on some of the nodes (since fixed). But we had a pretty > consistent error in this regard. > > We did open problem reports. Unfortunately, no action so far (they just > closed them this morning, though nothing has been solved per se, the issue > simply has not yet resurfaced). I'll leave those reports closed for now. > > This said, this error, or one with a very similar signature, has been in > the code since the 2.x series. I really ... really want to track it down, > but I can't create a simple replicator for it to present to the team. If > you have what you think is a simple replicator, please, email me offline. > We'll try it here, and if we can get it down to a very simple replication > case and test, we'll re-open the bugs. > > I'd hate to think its a heisenbug, but that is where I am leaning now. > > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics Inc. > email: landman at scalableinformatics.com > web : http://scalableinformatics.com > http://scalableinformatics.com/sicluster > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >