The max-pause limit helps to keep the sleep time inside balance_dirty_pages() within 200ms. The 200ms max sleep means per task rate limit of 8pages/200ms=160KB/s, which normally is enough to stop dirtiers from continue pushing the dirty pages high, unless there are a sufficient large number of slow dirtiers (ie. 500 tasks doing 160KB/s will still sum up to 80MB/s, reaching the write bandwidth of a slow disk). The pass-good limit helps to let go of the good bdi's in the presence of a blocked bdi (ie. NFS server not responding) or slow USB disk which for some reason build up a large number of initial dirty pages that refuse to go away anytime soon. Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- include/linux/writeback.h | 21 +++++++++++++++++++++ mm/page-writeback.c | 13 +++++++++++++ 2 files changed, 34 insertions(+) --- linux-next.orig/include/linux/writeback.h 2011-06-19 22:59:29.000000000 +0800 +++ linux-next/include/linux/writeback.h 2011-06-19 22:59:47.000000000 +0800 @@ -7,6 +7,27 @@ #include <linux/sched.h> #include <linux/fs.h> +/* + * The 1/16 region above the global dirty limit will be put to maximum pauses: + * + * (limit, limit + limit/DIRTY_MAXPAUSE) + * + * The 1/16 region above the max-pause region, dirty exceeded bdi's will be put + * to loops: + * + * (limit + limit/DIRTY_MAXPAUSE, limit + limit/DIRTY_PASSGOOD) + * + * Further beyond, all dirtier tasks will enter a loop waiting (possibly long + * time) for the dirty pages to drop, unless written enough pages. + * + * The global dirty threshold is normally equal to the global dirty limit, + * except when the system suddenly allocates a lot of anonymous memory and + * knocks down the global dirty threshold quickly, in which case the global + * dirty limit will follow down slowly to prevent livelocking all dirtier tasks. + */ +#define DIRTY_MAXPAUSE 16 +#define DIRTY_PASSGOOD 8 + struct backing_dev_info; /* --- linux-next.orig/mm/page-writeback.c 2011-06-19 22:59:29.000000000 +0800 +++ linux-next/mm/page-writeback.c 2011-06-19 22:59:47.000000000 +0800 @@ -399,6 +399,11 @@ unsigned long determine_dirtyable_memory return x + 1; /* Ensure that we never return 0 */ } +static unsigned long hard_dirty_limit(unsigned long thresh) +{ + return max(thresh, global_dirty_limit); +} + /* * global_dirty_limits - background-writeback and dirty-throttling thresholds * @@ -704,6 +709,14 @@ static void balance_dirty_pages(struct a __set_current_state(TASK_UNINTERRUPTIBLE); io_schedule_timeout(pause); + dirty_thresh = hard_dirty_limit(dirty_thresh); + if (nr_dirty < dirty_thresh + dirty_thresh / DIRTY_MAXPAUSE && + jiffies - start_time > MAX_PAUSE) + break; + if (nr_dirty < dirty_thresh + dirty_thresh / DIRTY_PASSGOOD && + bdi_dirty < bdi_thresh) + break; + /* * Increase the delay for each loop, up to our previous * default of taking a 100ms nap. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html