I was able to narrow it down to smallish python script. I've attached that to the bug. https://bugzilla.redhat.com/show_bug.cgi?id=1138970 On Sep 6, 2014, at 1:05 PM, Justin Clift <justin@xxxxxxxxxxx> wrote: > Thanks Mike, this is good stuff. :) > > + Justin > > > On 06/09/2014, at 8:19 PM, mike wrote: >> I upgraded the client to Gluster 3.5.2, but there is no difference. >> >> The bug is almost certainly in the Fuse client. If I remount the filesystem with NFS, the problem is no longer observable. >> >> I spent a little time looking through the xlator/fuse-bridge to see where the offsets are coming from, but I'm really not familiar enough with the code, so it is slow going. >> >> Unfortunately, I'm still having trouble reproducing this in a python script that could be readily attached to a bug report. >> >> I'll take a crack at that again, but I will a file a bug anyway for completeness. >> >> On Sep 5, 2014, at 7:10 PM, mike <mike@xxxxxxxxxxxxxxxxxxxx> wrote: >> >>> I have narrowed down the source of the bug. >>> >>> Here is an strace of glusterfsd http://fpaste.org/131455/40996378/ >>> >>> The first line represents a write that does *not* make it into the underlying file. >>> >>> The last line is the write that stomps the earlier write. >>> >>> As I said, the client file is opened in O_APPEND mode, but on the glusterfsd side, the file is just O_CREAT|O_WRONLY. The means the offsets to pwrite() need to be valid. >>> >>> I correlated this to a tcpdump I took and I can see that in fact, the RPCs being sent have the wrong offset. Interestingly, glusterfs.write-is-append = 0, which I wouldn't have expected. >>> >>> I think the bug lies in the glusterfs fuse client. >>> >>> As to your question about Gluster 3.5.2, I may be able to do that if I am unable to find the bug in the source. >>> >>> -Mike >>> >>> On Sep 5, 2014, at 6:16 PM, Justin Clift <justin@xxxxxxxxxxx> wrote: >>> >>>> On 06/09/2014, at 12:10 AM, mike wrote: >>>>> I have found that the O_APPEND flag is key to this failure - I had overlooked that flag when reading the strace and trying to cobble up a minimal reproduction. >>>>> >>>>> I now have a small pair of python scripts that can reliably reproduce this failure. >>>> >>>> >>>> As a thought, is there a reasonable way you can test this on GlusterFS 3.5.2? >>>> >>>> There were some important bug fixes in 3.5.2 (from 3.5.1). >>>> >>>> Note I'm not saying yours is one of them, I'm just asking if it's >>>> easy to test and find out. :) >>>> >>>> Regards and best wishes, >>>> >>>> Justin Clift >>>> >>>> -- >>>> GlusterFS - http://www.gluster.org >>>> >>>> An open source, distributed file system scaling to several >>>> petabytes, and handling thousands of clients. >>>> >>>> My personal twitter: twitter.com/realjustinclift >>>> >>> >> > > -- > GlusterFS - http://www.gluster.org > > An open source, distributed file system scaling to several > petabytes, and handling thousands of clients. > > My personal twitter: twitter.com/realjustinclift > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users