Re: fscache recursive hang -- similar to loopback NFS issues

NeilBrown <neilb@xxxxxxx> · Wed, 30 Jul 2014 12:19:35 +1000

On Tue, 29 Jul 2014 21:48:34 -0400 Milosz Tanski <milosz@xxxxxxxxx> wrote:

> I would vote on the lower end of the spectrum by default (closer to
> 100ms) since I imagine anybody deploying this in production
> environment would likely be using SSD drives for the caching. And in
> my tests on spinning disks there was little to no benefit outside of
> reducing network traffic.

Maybe I'm confused......

I thought the whole point of this patch was to avoid deadlocks.
Now you seem to be talking about a performance benefit.
What did I miss?

NeilBrown

> 
> - Milosz
> 
> On Tue, Jul 29, 2014 at 5:17 PM, NeilBrown <neilb@xxxxxxx> wrote:
> > On Tue, 29 Jul 2014 17:12:34 +0100 David Howells <dhowells@xxxxxxxxxx> wrote:
> >
> >> Milosz Tanski <milosz@xxxxxxxxx> wrote:
> >>
> >> > That's the same thing exact fix I started testing on Saturday. I found that
> >> > there already is a wait_event_timeout (even without your recent changes). The
> >> > thing I'm not quite sure is what timeout it should use?
> >>
> >> That's probably something to make an external tuning knob for.
> >>
> >> David
> >
> > Ugg.  External tuning knobs should be avoided wherever possible, and always
> > come with detailed instructions on how to tune them  </rant>
> >
> > In this case I think it very nearly doesn't matter *at all* what value is
> > used.
> >
> > If you set it a bit too high, then on the very very rare occasion that it
> > would currently deadlock, you get a longer-than-necessary wait.  So just make
> > sure that is short enough that by the time the sysadmin notices and starts
> > looking for the problem, it will be gone.
> >
> > And if you set it a bit too low, then it will loop around to find another
> > page to deal with before that one is finished being written out, and so maybe
> > do a little bit more work than is needed (though it'll be needed eventually).
> >
> > So the perfect number is somewhere between the typical response time for
> > storage, and the typical response time for the sys-admin.  Anywhere between
> > 100ms and 10sec would do.  1 second is the geo-mean.
> >
> > (sorry I didn't reply earlier - I missed you email somehow).
> >
> > NeilBrown
> 
> 
> 

Attachment:
signature.asc

Description: PGP signature