You have a replicated filesystem, brick1 and brick2. Brick 2 goes down and you edit a 4k file, appending data to it. That change, and the fact that there is a pending change, is stored on brick1. Brick2 returns to service. Your app wants to append to the file again. It calls stat on the file. Brick2 answers first stating that the file is 4k long. Your app seeks to 4k and writes. Now the data you wrote before is gone. This is one of the processes by which stale stat data can cause data loss. That's why each lookup() (which precedes the stat) causes a self-heal check and why it's a problem that hasn't been resolved in the last two years. I don't know the answer. I know that they want this problem to be solved, but right now the best solution is hardware. The lower the latency, the less of a problem you'll have. On 01/07/2013 12:59 PM, Dennis Jacobfeuerborn wrote: > On 01/07/2013 06:11 PM, Jeff Darcy wrote: >> On 01/07/2013 12:03 PM, Dennis Jacobfeuerborn wrote: >>> The "gm convert" processes make almost no progress even though on a regular >>> filesystem each call takes only a fraction of a second. >> Can you run gm_convert under strace? That will give us a more accurate >> idea of what kind of I/O it's generating. I recommend both -t and -T to >> get timing information as well. Also, it never hurts to file a bug so >> we can track/prioritize/etc. Thanks. >> >> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS > Thanks for the strace hint. As it turned out the gm convert call was issued > on the filename with a "[0]" appended which apparently led gm to stat() all > (!) files in the directory. > > While this particular problem isn't really a glusterfs problem is there a > way to improve the stat() performance in general? > > Regards, > Dennis > _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://supercolony.gluster.org/mailman/listinfo/gluster-users