On Tue, Mar 08, 2011 at 11:31:12PM +0100, Jan Kara wrote: > We set bdi->dirty_exceeded (and thus ratelimiting code starts to > call balance_dirty_pages() every 8 pages) when a per-bdi limit is > exceeded or global limit is exceeded. But per-bdi limit also depends > on the task. Thus different tasks reach the limit on that bdi at > different levels of dirty pages. The result is that with current code > bdi->dirty_exceeded ping-ponged between 1 and 0 depending on which task > just got into balance_dirty_pages(). > > We fix the issue by clearing bdi->dirty_exceeded only when per-bdi amount > of dirty pages drops below the threshold (7/8 * bdi_dirty_limit) where task > limits already do not have any influence. > > CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > CC: Christoph Hellwig <hch@xxxxxxxxxxxxx> > CC: Dave Chinner <david@xxxxxxxxxxxxx> > CC: Wu Fengguang <fengguang.wu@xxxxxxxxx> > CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > Signed-off-by: Jan Kara <jack@xxxxxxx> > --- > mm/page-writeback.c | 18 ++++++++++++++++-- > 1 files changed, 16 insertions(+), 2 deletions(-) > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index c472c1c..f388f70 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -275,12 +275,13 @@ static inline void task_dirties_fraction(struct task_struct *tsk, > * effectively curb the growth of dirty pages. Light dirtiers with high enough > * dirty threshold may never get throttled. > */ > +#define TASK_LIMIT_FRACTION 8 > static unsigned long task_dirty_limit(struct task_struct *tsk, > unsigned long bdi_dirty) > { > long numerator, denominator; > unsigned long dirty = bdi_dirty; > - u64 inv = dirty >> 3; > + u64 inv = dirty / TASK_LIMIT_FRACTION; > > task_dirties_fraction(tsk, &numerator, &denominator); > inv *= numerator; > @@ -291,6 +292,12 @@ static unsigned long task_dirty_limit(struct task_struct *tsk, > return max(dirty, bdi_dirty/2); > } > > +/* Minimum limit for any task */ > +static unsigned long task_min_dirty_limit(unsigned long bdi_dirty) > +{ > + return bdi_dirty - bdi_dirty / TASK_LIMIT_FRACTION; > +} > + Hi Jan, Should the above be called bdi_min_dirty_limit()? In essense we seem to be setting bdi->bdi_exceeded when dirty pages on bdi cross bdi_thresh and clear it when dirty pages on bdi are below 7/8*bdi_thresh. So there does not seem to be any dependency on task dirty limit here hence string "task" sounds confusing to me. In fact, would bdi_dirty_exceeded_clear_thresh() be a better name? > /* > * > */ > @@ -484,9 +491,11 @@ static void balance_dirty_pages(struct address_space *mapping, > unsigned long background_thresh; > unsigned long dirty_thresh; > unsigned long bdi_thresh; > + unsigned long min_bdi_thresh = ULONG_MAX; > unsigned long pages_written = 0; > unsigned long pause = 1; > bool dirty_exceeded = false; > + bool min_dirty_exceeded = false; > struct backing_dev_info *bdi = mapping->backing_dev_info; > > for (;;) { > @@ -513,6 +522,7 @@ static void balance_dirty_pages(struct address_space *mapping, > break; > > bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh); > + min_bdi_thresh = task_min_dirty_limit(bdi_thresh); > bdi_thresh = task_dirty_limit(current, bdi_thresh); ^^^^^ This patch aside, we use bdi_thresh name both for bdi threshold as well as per task per bdi threshold. will task_bdi_thresh be a better name here. > > /* > @@ -542,6 +552,9 @@ static void balance_dirty_pages(struct address_space *mapping, > dirty_exceeded = > (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh) > || (nr_reclaimable + nr_writeback > dirty_thresh); > + min_dirty_exceeded = > + (bdi_nr_reclaimable + bdi_nr_writeback > min_bdi_thresh) > + || (nr_reclaimable + nr_writeback > dirty_thresh); Would following be easier to understand. clear_dirty_exceeded = (bdi_nr_reclaimable + bdi_nr_writeback < dirty_exceeded_reset_thresh) && (nr_reclaimable + nr_writeback < dirty_thresh); > > if (!dirty_exceeded) > break; > @@ -579,7 +592,8 @@ static void balance_dirty_pages(struct address_space *mapping, > pause = HZ / 10; > } > > - if (!dirty_exceeded && bdi->dirty_exceeded) > + /* Clear dirty_exceeded flag only when no task can exceed the limit */ > + if (!min_dirty_exceeded && bdi->dirty_exceeded) > bdi->dirty_exceeded = 0; similiarly... if (bdi->dirty_exceeded && clear_dirty_exceeded) bdi->dirty_exceeded = 0; I was confused with the term min_dirty_limit and task_min_dirty_limit() for sometime as patch said that we are trying to move away from dependence on task specific bdi_thres for clearing bdi->bdi_thresh. May be it is just me... Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html