On Mon, Aug 08, 2011 at 09:47:14PM +0800, Peter Zijlstra wrote: > On Sat, 2011-08-06 at 16:44 +0800, Wu Fengguang wrote: > > Add two fields to task_struct. > > > > 1) account dirtied pages in the individual tasks, for accuracy > > 2) per-task balance_dirty_pages() call intervals, for flexibility > > > > The balance_dirty_pages() call interval (ie. nr_dirtied_pause) will > > scale near-sqrt to the safety gap between dirty pages and threshold. > > > > XXX: The main problem of per-task nr_dirtied is, if 10k tasks start > > dirtying pages at exactly the same time, each task will be assigned a > > large initial nr_dirtied_pause, so that the dirty threshold will be > > exceeded long before each task reached its nr_dirtied_pause and hence > > call balance_dirty_pages(). > > > > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > > --- > > include/linux/sched.h | 7 ++ > > mm/memory_hotplug.c | 3 - > > mm/page-writeback.c | 106 +++++++++------------------------------- > > 3 files changed, 32 insertions(+), 84 deletions(-) > > No fork() hooks? This way tasks inherit their parent's dirty count on > clone(). btw, I do have another patch queued for improving the "leaked dirties on exit" case :) Thanks, Fengguang --- Subject: writeback: charge leaked page dirties to active tasks Date: Tue Apr 05 13:21:19 CST 2011 It's a years long problem that a large number of short-lived dirtiers (eg. gcc instances in a fast kernel build) may starve long-run dirtiers (eg. dd) as well as pushing the dirty pages to the global hard limit. The solution is to charge the pages dirtied by the exited gcc to the other random gcc/dd instances. It sounds not perfect, however should behave good enough in practice. CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- include/linux/writeback.h | 2 ++ kernel/exit.c | 2 ++ mm/page-writeback.c | 11 +++++++++++ 3 files changed, 15 insertions(+) --- linux-next.orig/include/linux/writeback.h 2011-08-08 21:45:58.000000000 +0800 +++ linux-next/include/linux/writeback.h 2011-08-08 21:45:58.000000000 +0800 @@ -7,6 +7,8 @@ #include <linux/sched.h> #include <linux/fs.h> +DECLARE_PER_CPU(int, dirty_leaks); + /* * The 1/4 region under the global dirty thresh is for smooth dirty throttling: * --- linux-next.orig/mm/page-writeback.c 2011-08-08 21:45:58.000000000 +0800 +++ linux-next/mm/page-writeback.c 2011-08-08 22:21:50.000000000 +0800 @@ -190,6 +190,7 @@ int dirty_ratio_handler(struct ctl_table return ret; } +DEFINE_PER_CPU(int, dirty_leaks) = 0; int dirty_bytes_handler(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, @@ -1150,6 +1151,7 @@ void balance_dirty_pages_ratelimited_nr( { struct backing_dev_info *bdi = mapping->backing_dev_info; int ratelimit; + int *p; if (!bdi_cap_account_dirty(bdi)) return; @@ -1158,6 +1160,15 @@ void balance_dirty_pages_ratelimited_nr( if (bdi->dirty_exceeded) ratelimit = 8; + preempt_disable(); + p = &__get_cpu_var(dirty_leaks); + if (*p > 0 && current->nr_dirtied < ratelimit) { + nr_pages_dirtied = min(*p, ratelimit - current->nr_dirtied); + *p -= nr_pages_dirtied; + current->nr_dirtied += nr_pages_dirtied; + } + preempt_enable(); + if (unlikely(current->nr_dirtied >= ratelimit)) balance_dirty_pages(mapping, current->nr_dirtied); } --- linux-next.orig/kernel/exit.c 2011-08-08 21:43:37.000000000 +0800 +++ linux-next/kernel/exit.c 2011-08-08 21:45:58.000000000 +0800 @@ -1039,6 +1039,8 @@ NORET_TYPE void do_exit(long code) validate_creds_for_do_exit(tsk); preempt_disable(); + if (tsk->nr_dirtied) + __this_cpu_add(dirty_leaks, tsk->nr_dirtied); exit_rcu(); /* causes final put_task_struct in finish_task_switch(). */ tsk->state = TASK_DEAD; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html