On Thu, Aug 26, 2010 at 09:31:30PM +0100, Mel Gorman wrote: > On Thu, Aug 26, 2010 at 08:29:04PM +0200, Johannes Weiner wrote: > > On Thu, Aug 26, 2010 at 04:14:15PM +0100, Mel Gorman wrote: > > > If congestion_wait() is called when there is no congestion, the caller > > > will wait for the full timeout. This can cause unreasonable and > > > unnecessary stalls. There are a number of potential modifications that > > > could be made to wake sleepers but this patch measures how serious the > > > problem is. It keeps count of how many congested BDIs there are. If > > > congestion_wait() is called with no BDIs congested, the tracepoint will > > > record that the wait was unnecessary. > > > > I am not convinced that unnecessary is the right word. On a workload > > without any IO (i.e. no congestion_wait() necessary, ever), I noticed > > the VM regressing both in time and in reclaiming the right pages when > > simply removing congestion_wait() from the direct reclaim paths (the > > one in __alloc_pages_slowpath and the other one in > > do_try_to_free_pages). > > > > So just being stupid and waiting for the timeout in direct reclaim > > while kswapd can make progress seemed to do a better job for that > > load. > > > > I can not exactly pinpoint the reason for that behaviour, it would be > > nice if somebody had an idea. > > > > There is a possibility that the behaviour in that case was due to flusher > threads doing the writes rather than direct reclaim queueing pages for IO > in an inefficient manner. So the stall is stupid but happens to work out > well because flusher threads get the chance to do work. The workload was accessing a large sparse-file through mmap, so there wasn't much IO in the first place. And I experimented on the latest -mmotm where direct reclaim wouldn't do writeback by itself anymore, but kick the flushers. > > So personally I think it's a good idea to get an insight on the use of > > congestion_wait() [patch 1] but I don't agree with changing its > > behaviour just yet, or judging its usefulness solely on whether it > > correctly waits for bdi congestion. > > > > Unfortunately, I strongly suspect that some of the desktop stalls seen during > IO (one of which involved no writes) were due to calling congestion_wait > and waiting the full timeout where no writes are going on. Oh, I am in full agreement here! Removing those congestion_wait() as described above showed a reduction in peak latency. The dilemma is only that it increased the overall walltime of the load. And the scanning behaviour deteriorated, as in having increased scanning pressure on other zones than the unpatched kernel did. So I think very much that we need a fix. congestion_wait() causes stalls and relying on random sleeps for the current reclaim behaviour can not be the solution, at all. I just don't think we can remove it based on the argument that it doesn't do what it is supposed to do, when it does other things right that it is not supposed to do ;-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html