The patch titled mm: balance_dirty_pages(): reduce calls to global_page_state to reduce cache references has been added to the -mm tree. Its filename is mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-cache-references.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://userweb.kernel.org/~akpm/stuff/added-to-mm.txt to find out what to do about this The current -mm tree may be found at http://userweb.kernel.org/~akpm/mmotm/ ------------------------------------------------------ Subject: mm: balance_dirty_pages(): reduce calls to global_page_state to reduce cache references From: Richard Kennedy <richard@xxxxxxxxxxxxxxx> Reducing the number of times balance_dirty_pages calls global_page_state reduces the cache references and so improves write performance on a variety of workloads. 'perf stats' of simple fio write tests shows the reduction in cache access. Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2 with 3Gb memory (dirty_threshold approx 600 Mb) running each test 10 times, taking the average & standard deviation average (s.d.) in millions (10^6) 2.6.31-rc6 661 (9.88) +patch 604 (4.19) Achieving this reduction is by dropping clip_bdi_dirty_limit as it rereads the counters to apply the dirty_threshold and moving this check up into balance_dirty_pages where it has already read the counters. Also by rearrange the for loop to only contain one copy of the limit tests allows the pdflush test after the loop to use the local copies of the counters rather than rereading then. In the common case with no throttling it now calls global_page_state 5 fewer times and bdi_stat 2 fewer. I have tried to retain the existing behavior as much as possible, but have added NR_WRITEBACK_TEMP to nr_writeback. This counter was used in clip_bdi_dirty_limit but not in balance_dirty_pages, grep suggests this is only used by FUSE but I haven't done any testing on that. It does seem logical to count all the WRITEBACK pages when making the throttling decisions so this change should be more correct ;) I have been running this patch for over a week and have had no problems with it and generally see improved disk write performance on a variety of tests & workloads, even in the worst cases performance is the same as the unpatched kernel. I also tried this on a Intel ATOM 330 twincore system and saw similar improvements. Signed-off-by: Richard Kennedy <richard@xxxxxxxxxxxxxxx> Cc: Chris Mason <chris.mason@xxxxxxxxxx> Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> Cc: Jens Axboe <jens.axboe@xxxxxxxxxx> Cc: Wu Fengguang <fengguang.wu@xxxxxxxxx> Cc: Martin Bligh <mbligh@xxxxxxxxxx> Cc: Miklos Szeredi <miklos@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/page-writeback.c | 116 +++++++++++++++--------------------------- 1 file changed, 43 insertions(+), 73 deletions(-) diff -puN mm/page-writeback.c~mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-cache-references mm/page-writeback.c --- a/mm/page-writeback.c~mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-cache-references +++ a/mm/page-writeback.c @@ -249,32 +249,6 @@ static void bdi_writeout_fraction(struct } } -/* - * Clip the earned share of dirty pages to that which is actually available. - * This avoids exceeding the total dirty_limit when the floating averages - * fluctuate too quickly. - */ -static void clip_bdi_dirty_limit(struct backing_dev_info *bdi, - unsigned long dirty, unsigned long *pbdi_dirty) -{ - unsigned long avail_dirty; - - avail_dirty = global_page_state(NR_FILE_DIRTY) + - global_page_state(NR_WRITEBACK) + - global_page_state(NR_UNSTABLE_NFS) + - global_page_state(NR_WRITEBACK_TEMP); - - if (avail_dirty < dirty) - avail_dirty = dirty - avail_dirty; - else - avail_dirty = 0; - - avail_dirty += bdi_stat(bdi, BDI_RECLAIMABLE) + - bdi_stat(bdi, BDI_WRITEBACK); - - *pbdi_dirty = min(*pbdi_dirty, avail_dirty); -} - static inline void task_dirties_fraction(struct task_struct *tsk, long *numerator, long *denominator) { @@ -465,7 +439,6 @@ get_dirty_limits(unsigned long *pbackgro bdi_dirty = dirty * bdi->max_ratio / 100; *pbdi_dirty = bdi_dirty; - clip_bdi_dirty_limit(bdi, dirty, pbdi_dirty); task_dirty_limit(current, pbdi_dirty); } } @@ -499,45 +472,12 @@ static void balance_dirty_pages(struct a }; get_dirty_limits(&background_thresh, &dirty_thresh, - &bdi_thresh, bdi); + &bdi_thresh, bdi); nr_reclaimable = global_page_state(NR_FILE_DIRTY) + - global_page_state(NR_UNSTABLE_NFS); - nr_writeback = global_page_state(NR_WRITEBACK); - - bdi_nr_reclaimable = bdi_stat(bdi, BDI_RECLAIMABLE); - bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); - - if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) - break; - - /* - * Throttle it only when the background writeback cannot - * catch-up. This avoids (excessively) small writeouts - * when the bdi limits are ramping up. - */ - if (nr_reclaimable + nr_writeback < - (background_thresh + dirty_thresh) / 2) - break; - - if (!bdi->dirty_exceeded) - bdi->dirty_exceeded = 1; - - /* Note: nr_reclaimable denotes nr_dirty + nr_unstable. - * Unstable writes are a feature of certain networked - * filesystems (i.e. NFS) in which data may have been - * written to the server's write cache, but has not yet - * been flushed to permanent storage. - * Only move pages to writeback if this bdi is over its - * threshold otherwise wait until the disk writes catch - * up. - */ - if (bdi_nr_reclaimable > bdi_thresh) { - generic_sync_bdi_inodes(NULL, &wbc); - pages_written += write_chunk - wbc.nr_to_write; - get_dirty_limits(&background_thresh, &dirty_thresh, - &bdi_thresh, bdi); - } + global_page_state(NR_UNSTABLE_NFS); + nr_writeback = global_page_state(NR_WRITEBACK) + + global_page_state(NR_WRITEBACK_TEMP); /* * In order to avoid the stacked BDI deadlock we need @@ -557,16 +497,48 @@ static void balance_dirty_pages(struct a bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); } - if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) - break; - if (pages_written >= write_chunk) - break; /* We've done our duty */ + /* always throttle if over threshold */ + if (nr_reclaimable + nr_writeback < dirty_thresh) { + + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) + break; + + /* + * Throttle it only when the background writeback cannot + * catch-up. This avoids (excessively) small writeouts + * when the bdi limits are ramping up. + */ + if (nr_reclaimable + nr_writeback < + (background_thresh + dirty_thresh) / 2) + break; + + /* done enough? */ + if (pages_written >= write_chunk) + break; + } + if (!bdi->dirty_exceeded) + bdi->dirty_exceeded = 1; + /* Note: nr_reclaimable denotes nr_dirty + nr_unstable. + * Unstable writes are a feature of certain networked + * filesystems (i.e. NFS) in which data may have been + * written to the server's write cache, but has not yet + * been flushed to permanent storage. + * Only move pages to writeback if this bdi is over its + * threshold otherwise wait until the disk writes catch + * up. + */ + if (bdi_nr_reclaimable > bdi_thresh) { + writeback_inodes(&wbc); + pages_written += write_chunk - wbc.nr_to_write; + if (wbc.nr_to_write == 0) + continue; + } congestion_wait(BLK_RW_ASYNC, HZ/10); } if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh && - bdi->dirty_exceeded) + bdi->dirty_exceeded) bdi->dirty_exceeded = 0; if (writeback_in_progress(bdi)) @@ -580,10 +552,8 @@ static void balance_dirty_pages(struct a * In normal mode, we start background writeout at the lower * background_thresh, to keep the amount of dirty memory low. */ - if ((laptop_mode && pages_written) || - (!laptop_mode && (global_page_state(NR_FILE_DIRTY) - + global_page_state(NR_UNSTABLE_NFS) - > background_thresh))) + if ((laptop_mode && pages_written) || (!laptop_mode && + (nr_reclaimable > background_thresh))) bdi_start_writeback(bdi, NULL, 0, WB_SYNC_NONE); } _ Patches currently in -mm which might be from richard@xxxxxxxxxxxxxxx are mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-cache-references.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html