On Wed, 24 Jun 2009 10:30:54 +0100, Vidar Hokstad <vidar@xxxxxxxxxxx> wrote: > See below for what I'm currently running. I'm testing without readahead at > the moment - I haven't seen it fail with it commented out so far, but it's > too early to tell. > > Before I added the trace translator the error rate was much higher. After I > added it, it took several hours before it started failing, even though I've > excluded almost all the calls. Sounds like a timing/sync/cleanup issue if adding trace code affects the frequency of the issue arising. > I can only reproduce it when I add the server in question to our production > cluster, unfortunately - I've tried repeatedly to reproduce it with a test > environment, but I'm clearly missing out on some element of the access > pattern we see in production. I should probably try to replay the requests > from our access logs or something. What version of fuse are you running? I have found a number of heisenbug issues arising from using mismatched fuse versions, and installing glusterfs patched fuse (2.7.4glfs11 last time I checked) made most of those problems go away. Gordan