On Thu, Aug 26, 2010 at 08:29:04PM +0200, Johannes Weiner wrote: > On Thu, Aug 26, 2010 at 04:14:15PM +0100, Mel Gorman wrote: > > If congestion_wait() is called when there is no congestion, the caller > > will wait for the full timeout. This can cause unreasonable and > > unnecessary stalls. There are a number of potential modifications that > > could be made to wake sleepers but this patch measures how serious the > > problem is. It keeps count of how many congested BDIs there are. If > > congestion_wait() is called with no BDIs congested, the tracepoint will > > record that the wait was unnecessary. > > I am not convinced that unnecessary is the right word. On a workload > without any IO (i.e. no congestion_wait() necessary, ever), I noticed > the VM regressing both in time and in reclaiming the right pages when > simply removing congestion_wait() from the direct reclaim paths (the > one in __alloc_pages_slowpath and the other one in > do_try_to_free_pages). > > So just being stupid and waiting for the timeout in direct reclaim > while kswapd can make progress seemed to do a better job for that > load. > > I can not exactly pinpoint the reason for that behaviour, it would be > nice if somebody had an idea. > There is a possibility that the behaviour in that case was due to flusher threads doing the writes rather than direct reclaim queueing pages for IO in an inefficient manner. So the stall is stupid but happens to work out well because flusher threads get the chance to do work. > So personally I think it's a good idea to get an insight on the use of > congestion_wait() [patch 1] but I don't agree with changing its > behaviour just yet, or judging its usefulness solely on whether it > correctly waits for bdi congestion. > Unfortunately, I strongly suspect that some of the desktop stalls seen during IO (one of which involved no writes) were due to calling congestion_wait and waiting the full timeout where no writes are going on. It gets potentially worse too. Lets say we have a system with many BDIs of different speed - e.g. SSD on one end of the spectrum and USB flash drive on the other. The congestion for writes could be on the USB flash drive but due to low memory, the allocator, direct reclaimers and kswapd go to sleep periodically on congestion_wait for USB even though the bulk of the pages need reclaiming are backed by an SSD. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html