On Tue, Mar 08, 2011 at 11:31:13PM +0100, Jan Kara wrote: > This patch changes balance_dirty_pages() throttling so that the function does > not submit writes on its own but rather waits for flusher thread to do enough > writes. This has an advantage that we have a single source of IO allowing for > better writeback locality. Also we do not have to reenter filesystems from a > non-trivial context. > > The waiting is implemented as follows: Whenever we decide to throttle a task in > balance_dirty_pages(), task adds itself to a list of tasks that are throttled > against that bdi and goes to sleep waiting to receive specified amount of page > IO completions. Once in a while (currently HZ/10, later the interval should be > autotuned based on observed IO completion rate), accumulated page IO > completions are distributed equally among waiting tasks. > > This waiting scheme has been chosen so that waiting time in > balance_dirty_pages() is proportional to > number_waited_pages * number_of_waiters. > In particular it does not depend on the total number of pages being waited for, > thus providing possibly a fairer results. Note that the dependency on the > number of waiters is inevitable, since all the waiters compete for a common > resource so their number has to be somehow reflected in waiting time. > > CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > CC: Christoph Hellwig <hch@xxxxxxxxxxxxx> > CC: Dave Chinner <david@xxxxxxxxxxxxx> > CC: Wu Fengguang <fengguang.wu@xxxxxxxxx> > CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > Signed-off-by: Jan Kara <jack@xxxxxxx> > --- > include/linux/backing-dev.h | 7 + > include/linux/writeback.h | 1 + > include/trace/events/writeback.h | 65 +++++++- > mm/backing-dev.c | 8 + > mm/page-writeback.c | 345 +++++++++++++++++++++++++------------- > 5 files changed, 310 insertions(+), 116 deletions(-) > [..] > +/* > + * balance_dirty_pages() must be called by processes which are generating dirty > + * data. It looks at the number of dirty pages in the machine and will force > + * the caller to perform writeback if the system is over `vm_dirty_ratio'. > + * If we're over `background_thresh' then the writeback threads are woken to > + * perform some writeout. > + */ > +static void balance_dirty_pages(struct address_space *mapping, > + unsigned long write_chunk) > +{ > + struct backing_dev_info *bdi = mapping->backing_dev_info; > + struct balance_waiter bw; > + struct dirty_limit_state st; > + int dirty_exceeded = check_dirty_limits(bdi, &st); > + > + if (dirty_exceeded < DIRTY_MAY_EXCEED_LIMIT || > + (dirty_exceeded == DIRTY_MAY_EXCEED_LIMIT && > + !bdi_task_limit_exceeded(&st, current))) { > + if (bdi->dirty_exceeded && > + dirty_exceeded < DIRTY_MAY_EXCEED_LIMIT) > + bdi->dirty_exceeded = 0; > /* > - * Increase the delay for each loop, up to our previous > - * default of taking a 100ms nap. > + * In laptop mode, we wait until hitting the higher threshold > + * before starting background writeout, and then write out all > + * the way down to the lower threshold. So slow writers cause > + * minimal disk activity. > + * > + * In normal mode, we start background writeout at the lower > + * background_thresh, to keep the amount of dirty memory low. > */ > - pause <<= 1; > - if (pause > HZ / 10) > - pause = HZ / 10; > + if (!laptop_mode && dirty_exceeded == DIRTY_EXCEED_BACKGROUND) > + bdi_start_background_writeback(bdi); > + return; > } > > - /* Clear dirty_exceeded flag only when no task can exceed the limit */ > - if (!min_dirty_exceeded && bdi->dirty_exceeded) > - bdi->dirty_exceeded = 0; > + if (!bdi->dirty_exceeded) > + bdi->dirty_exceeded = 1; Will it make sense to move out bdi_task_limit_exceeded() check in a separate if condition statement as follows. May be this is little easier to read. if (dirty_exceeded < DIRTY_MAY_EXCEED_LIMIT) { if (bdi->dirty_exceeded) bdi->dirty_exceeded = 0; if (!laptop_mode && dirty_exceeded == DIRTY_EXCEED_BACKGROUND) bdi_start_background_writeback(bdi); return; } if (dirty_exceeded == DIRTY_MAY_EXCEED_LIMIT && !bdi_task_limit_exceeded(&st, current)) return; /* Either task is throttled or we crossed global dirty ratio */ if (!bdi->dirty_exceeded) bdi->dirty_exceeded = 1; Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html