On Wed, Sep 08, 2010 at 12:25:33AM +0900, Minchan Kim wrote: > On Mon, Sep 06, 2010 at 11:47:26AM +0100, Mel Gorman wrote: > > If congestion_wait() is called with no BDIs congested, the caller will sleep > > for the full timeout and this may be an unnecessary sleep. This patch adds > > a wait_iff_congested() that checks congestion and only sleeps if a BDI is > > congested or if there is a significant amount of writeback going on in an > > interesting zone. Else, it calls cond_resched() to ensure the caller is > > not hogging the CPU longer than its quota but otherwise will not sleep. > > > > This is aimed at reducing some of the major desktop stalls reported during > > IO. For example, while kswapd is operating, it calls congestion_wait() > > but it could just have been reclaiming clean page cache pages with no > > congestion. Without this patch, it would sleep for a full timeout but after > > this patch, it'll just call schedule() if it has been on the CPU too long. > > Similar logic applies to direct reclaimers that are not making enough > > progress. > > > > Signed-off-by: Mel Gorman <mel@xxxxxxxxx> > > --- > > <SNIP> > > +/** > > + * congestion_wait - wait for a backing_dev to become uncongested > wait_iff_congested > Fixed, thanks. > > + * @zone: A zone to consider the number of being being written back from > > + * @sync: SYNC or ASYNC IO > > + * @timeout: timeout in jiffies > > + * > > + * Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit > > + * write congestion. If no backing_devs are congested then the number of > > + * writeback pages in the zone are checked and compared to the inactive > > + * list. If there is no sigificant writeback or congestion, there is no point > and > Why and? "or" makes sense because we avoid sleeping on either condition. > > + * in sleeping but cond_resched() is called in case the current process has > > + * consumed its CPU quota. > > + */ > > +long wait_iff_congested(struct zone *zone, int sync, long timeout) > > +{ > > + long ret; > > + unsigned long start = jiffies; > > + DEFINE_WAIT(wait); > > + wait_queue_head_t *wqh = &congestion_wqh[sync]; > > + > > + /* > > + * If there is no congestion, check the amount of writeback. If there > > + * is no significant writeback and no congestion, just cond_resched > > + */ > > + if (atomic_read(&nr_bdi_congested[sync]) == 0) { > > + unsigned long inactive, writeback; > > + > > + inactive = zone_page_state(zone, NR_INACTIVE_FILE) + > > + zone_page_state(zone, NR_INACTIVE_ANON); > > + writeback = zone_page_state(zone, NR_WRITEBACK); > > + > > + /* > > + * If less than half the inactive list is being written back, > > + * reclaim might as well continue > > + */ > > + if (writeback < inactive / 2) { > > I am not sure this is best. > I'm not saying it is. The objective is to identify a situation where sleeping until the next write or congestion clears is pointless. We have already identified that we are not congested so the question is "are we writing a lot at the moment?". The assumption is that if there is a lot of writing going on, we might as well sleep until one completes rather than reclaiming more. This is the first effort at identifying pointless sleeps. Better ones might be identified in the future but that shouldn't stop us making a semi-sensible decision now. > 1. Without considering various speed class storage, could we fix it as half of inactive? We don't really have a good means of identifying speed classes of storage. Worse, we are considering on a zone-basis here, not a BDI basis. The pages being written back in the zone could be backed by anything so we cannot make decisions based on BDI speed. > 2. Isn't there any writeback throttling on above layer? Do we care of it in here? > There are but congestion_wait() and now wait_iff_congested() are part of that. We can see from the figures in the leader that congestion_wait() is sleeping more than is necessary or smart. > Just out of curiosity. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html