Re: [PATCH 01/45] writeback: reduce calls to global_page_state in balance_dirty_pages()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2009-10-13 at 11:24 +0800, Wu Fengguang wrote:

> You are right too :) I followed you and Peter's advice to do the loop
> and the recheck of stats as follows:

> 	This patch slightly changes behavior by replacing clip_bdi_dirty_limit()
>         with the explicit check (nr_reclaimable + nr_writeback >= dirty_thresh)
>         to avoid exceeding the dirty limit. Since the bdi dirty limit is mostly
>         accurate we don't need to do routinely clip. A simple dirty limit check
>         would be enough.
> 
>         The check is necessary because, in principle we should throttle
>         everything calling balance_dirty_pages() when we're over the total
>         limit, as said by Peter.
> 
>         We now set and clear dirty_exceeded not only based on bdi dirty limits,
>         but also on the global dirty limits. This is a bit counterintuitive, but
>         the global limits are the ultimate goal and shall be always imposed.
> 
>         We may now start background writeback work based on outdated conditions.
>         That's safe because the bdi flush thread will (and have to) double check
>         the states. It reduces overall overheads because the test based on old
>         states still have good chance to be right.

> static void balance_dirty_pages(struct address_space *mapping,
> 				unsigned long write_chunk)
> {
> 	long nr_reclaimable, bdi_nr_reclaimable;
> 	long nr_writeback, bdi_nr_writeback;
> 	unsigned long background_thresh;
> 	unsigned long dirty_thresh;
> 	unsigned long bdi_thresh;
> 	int dirty_exceeded;
> 	struct backing_dev_info *bdi = mapping->backing_dev_info;
> 
> 	/*
> 	 * If sync() is in progress, curb the to-be-synced inodes regardless
> 	 * of dirty limits, so that a fast dirtier won't livelock the sync.
> 	 */
> 	if (unlikely(bdi->sync_time &&
> 		     S_ISREG(mapping->host->i_mode) &&
> 		     time_after_eq(bdi->sync_time,
> 				   mapping->host->dirtied_when))) {
> 		write_chunk *= 2;
> 		bdi_writeback_wait(bdi, write_chunk);
> 	}
> 
> 	for (;;) {
> 		nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> 				 global_page_state(NR_UNSTABLE_NFS);
> 		nr_writeback = global_page_state(NR_WRITEBACK) +
> 			       global_page_state(NR_WRITEBACK_TEMP);
> 
> 		global_dirty_thresh(&background_thresh, &dirty_thresh);
> 
> 		/*
> 		 * Throttle it only when the background writeback cannot
> 		 * catch-up. This avoids (excessively) small writeouts
> 		 * when the bdi limits are ramping up.
> 		 */
> 		if (nr_reclaimable + nr_writeback <
> 		    (background_thresh + dirty_thresh) / 2)
> 			break;
> 
> 		bdi_thresh = bdi_dirty_thresh(bdi, dirty_thresh);
> 
> 		/*
> 		 * In order to avoid the stacked BDI deadlock we need
> 		 * to ensure we accurately count the 'dirty' pages when
> 		 * the threshold is low.
> 		 *
> 		 * Otherwise it would be possible to get thresh+n pages
> 		 * reported dirty, even though there are thresh-m pages
> 		 * actually dirty; with m+n sitting in the percpu
> 		 * deltas.
> 		 */
> 		if (bdi_thresh < 2*bdi_stat_error(bdi)) {
> 			bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE);
> 			bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK);
> 		} else {
> 			bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE);
> 			bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> 		}
> 
> 		/*
> 		 * The bdi thresh is somehow "soft" limit derived from the
> 		 * global "hard" limit. The former helps to prevent heavy IO
> 		 * bdi or process from holding back light ones; The latter is
> 		 * the last resort safeguard.
> 		 */
> 		dirty_exceeded =
> 			(bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh)
> 			|| (nr_reclaimable + nr_writeback >= dirty_thresh);
> 
> 		if (!dirty_exceeded)
> 			break;
> 
> 		bdi->dirty_exceed_time = jiffies;
> 
> 		bdi_writeback_wait(bdi, write_chunk);
> 	}
> 
> 	/*
> 	 * In laptop mode, we wait until hitting the higher threshold before
> 	 * starting background writeout, and then write out all the way down
> 	 * to the lower threshold.  So slow writers cause minimal disk activity.
> 	 *
> 	 * In normal mode, we start background writeout at the lower
> 	 * background_thresh, to keep the amount of dirty memory low.
> 	 */
> 	if (!laptop_mode && (nr_reclaimable > background_thresh) &&
> 	    can_submit_background_writeback(bdi))
> 		bdi_start_writeback(bdi, NULL, WB_FOR_BACKGROUND);
> }

Looks good, Thanks Wu!

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux