Take the task's think time into account when computing the final pause time. This will make accurate throttle bandwidth. In the rare case that the task slept longer than the period time, the extra sleep time will also be compensated in next period if it's not too big (<100ms). Accumulated errors are carefully avoided as long as the task don't sleep for too long time. case 1: period > think pause = period - think paused_when += pause period time |======================================>| think time |===============>| ------|----------------|----------------------|----------- paused_when jiffies case 2: period <= think don't pause and reduce future pause time by: paused_when += period period time |=========================>| think time |======================================>| ------|--------------------------+------------|----------- paused_when jiffies Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- include/linux/sched.h | 1 + mm/page-writeback.c | 22 ++++++++++++++++++++-- 2 files changed, 21 insertions(+), 2 deletions(-) --- linux-next.orig/include/linux/sched.h 2010-12-09 11:50:59.000000000 +0800 +++ linux-next/include/linux/sched.h 2010-12-09 11:54:28.000000000 +0800 @@ -1477,6 +1477,7 @@ struct task_struct { */ int nr_dirtied; int nr_dirtied_pause; + unsigned long paused_when; /* start of a write-and-pause period */ #ifdef CONFIG_LATENCYTOP int latency_record_count; --- linux-next.orig/mm/page-writeback.c 2010-12-09 11:54:10.000000000 +0800 +++ linux-next/mm/page-writeback.c 2010-12-09 12:00:53.000000000 +0800 @@ -597,6 +597,7 @@ static void balance_dirty_pages(struct a unsigned long bdi_thresh; unsigned long task_thresh; unsigned long long bw; + unsigned long period; unsigned long pause = 0; bool dirty_exceeded = false; struct backing_dev_info *bdi = mapping->backing_dev_info; @@ -667,7 +668,7 @@ static void balance_dirty_pages(struct a bdi_update_bandwidth(bdi, start_time, bdi_dirty, bdi_thresh); - if (bdi_dirty >= task_thresh) { + if (bdi_dirty >= task_thresh || nr_dirty > dirty_thresh) { pause = MAX_PAUSE; goto pause; } @@ -686,7 +687,22 @@ static void balance_dirty_pages(struct a bw = bw * (task_thresh - bdi_dirty); do_div(bw, bdi_thresh / TASK_SOFT_DIRTY_LIMIT + 1); - pause = HZ * pages_dirtied / ((unsigned long)bw + 1); + period = HZ * pages_dirtied / ((unsigned long)bw + 1) + 1; + pause = current->paused_when + period - jiffies; + /* + * Take it as long think time if pause falls into (-10s, 0). + * If it's less than 100ms, try to compensate it in future by + * updating the virtual time; otherwise just reset the time, as + * it may be a light dirtier. + */ + if (unlikely(-pause < HZ*10)) { + if (-pause <= HZ/10) + current->paused_when += period; + else + current->paused_when = jiffies; + pause = 1; + break; + } pause = clamp_val(pause, 1, MAX_PAUSE); pause: @@ -696,8 +712,10 @@ pause: task_thresh, pages_dirtied, pause); + current->paused_when = jiffies; __set_current_state(TASK_UNINTERRUPTIBLE); io_schedule_timeout(pause); + current->paused_when += pause; /* * The bdi thresh is somehow "soft" limit derived from the -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>