On 01/07/2013 05:06 PM, Stephan von Krawczynski wrote: > On Mon, 07 Jan 2013 13:19:49 -0800 > Joe Julian <joe at julianfamily.org> wrote: > >> You have a replicated filesystem, brick1 and brick2. >> Brick 2 goes down and you edit a 4k file, appending data to it. >> That change, and the fact that there is a pending change, is stored on >> brick1. >> Brick2 returns to service. >> Your app wants to append to the file again. It calls stat on the file. >> Brick2 answers first stating that the file is 4k long. Your app seeks to >> 4k and writes. Now the data you wrote before is gone. > Forgive my ignorance, but it obvious that this implementation of a stat on a > replicating fs is shit. Of course a stat should await _all_ returning local > stats and should choose the stat of the _latest_ file version and note that > the file needs self heal. Apparently I wasn't very clear that I was demonstrating an example of /why/ there is a self-heal check whenever stat (or anything else that instantiates a file descriptor) is called. > >> This is one of the processes by which stale stat data can cause data >> loss. That's why each lookup() (which precedes the stat) causes a >> self-heal check and why it's a problem that hasn't been resolved in the >> last two years. > self-heal is no answer to this question. The only valid answer is choosing the > _latest_ file version no matter if self heal is necessary or not. How do you know the _latest_? You contact the bricks that have the file. In a replicated volume that only happens if you check with _all_ the replicas. That's called a self-heal check. I'm not saying that if a self-heal is needed that it's completed before that answer is returned, simply that there's extra latency involved in ensuring you're not given the wrong response. > >> I don't know the answer. I know that they want this problem to be >> solved, but right now the best solution is hardware. The lower the >> latency, the less of a problem you'll have. > The only solution is correct programming, no matter what the below hardware > looks like. The only outcome of good or bad hardware is how _fast_ the > _correct_ answer reaches the fs client. Yes, if you can control the programming of your application, that would be a better solution. Unfortunately most of us use pre-packaged software like apache, php, etc. Since most of us don't have the chance to use the "correct programming" solution, then you're going to need to decrease latency if your going to open thousands of fd's for every operation and are unsatisfied with the results. > > Your description is a satire, not? > > >> On 01/07/2013 12:59 PM, Dennis Jacobfeuerborn wrote: >>> On 01/07/2013 06:11 PM, Jeff Darcy wrote: >>>> On 01/07/2013 12:03 PM, Dennis Jacobfeuerborn wrote: >>>>> The "gm convert" processes make almost no progress even though on a regular >>>>> filesystem each call takes only a fraction of a second. >>>> Can you run gm_convert under strace? That will give us a more accurate >>>> idea of what kind of I/O it's generating. I recommend both -t and -T to >>>> get timing information as well. Also, it never hurts to file a bug so >>>> we can track/prioritize/etc. Thanks. >>>> >>>> https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS >>> Thanks for the strace hint. As it turned out the gm convert call was issued >>> on the filename with a "[0]" appended which apparently led gm to stat() all >>> (!) files in the directory. >>> >>> While this particular problem isn't really a glusterfs problem is there a >>> way to improve the stat() performance in general? >>> >>> Regards, >>> Dennis >>> _______________________________________________ >>> Gluster-users mailing list >>> Gluster-users at gluster.org >>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://supercolony.gluster.org/mailman/listinfo/gluster-users >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130107/a885f217/attachment-0001.html>