Hi Jan, On Fri, Jul 01, 2011 at 02:32:44AM +0800, Jan Kara wrote: > We set bdi->dirty_exceeded (and thus ratelimiting code starts to > call balance_dirty_pages() every 8 pages) when a per-bdi limit is > exceeded or global limit is exceeded. But per-bdi limit also depends > on the task. Thus different tasks reach the limit on that bdi at > different levels of dirty pages. The result is that with current code > bdi->dirty_exceeded ping-ponged between 1 and 0 depending on which task > just got into balance_dirty_pages(). > > We fix the issue by clearing bdi->dirty_exceeded only when per-bdi amount > of dirty pages drops below the threshold (7/8 * bdi_dirty_limit) where task > limits already do not have any influence. The end result, I think, is that the dirty pages are kept more tightly under control, with the average number a slightly lowered than before. This reduces the risk to throttle light dirtiers and hence more responsive. However it does introduce more overheads by enforcing balance_dirty_pages() calls on every 8 pages. > CC: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > CC: Christoph Hellwig <hch@xxxxxxxxxxxxx> > CC: Dave Chinner <david@xxxxxxxxxxxxx> > CC: Wu Fengguang <fengguang.wu@xxxxxxxxx> > CC: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > Signed-off-by: Jan Kara <jack@xxxxxxx> > --- > mm/page-writeback.c | 29 ++++++++++++++++++++++------- > 1 files changed, 22 insertions(+), 7 deletions(-) > > This is the patch fixing dirty_exceeded logic I promised you last week. > I based it on current Linus's tree as it is a relatively direct fix so I > expect it can be somewhere in the beginning of the patch series and merged > relatively quickly. Can you please add it to your tree? Thanks. OK. I noticed that bdi_thresh is no longer used. What if we just rename it? But I admit that the patch in its current form looks a bit more clear in concept. Thanks, Fengguang > Honza > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c > index 31f6988..d8b395f 100644 > --- a/mm/page-writeback.c > +++ b/mm/page-writeback.c > @@ -274,12 +274,13 @@ static inline void task_dirties_fraction(struct task_struct *tsk, > * effectively curb the growth of dirty pages. Light dirtiers with high enough > * dirty threshold may never get throttled. > */ > +#define TASK_LIMIT_FRACTION 8 > static unsigned long task_dirty_limit(struct task_struct *tsk, > unsigned long bdi_dirty) > { > long numerator, denominator; > unsigned long dirty = bdi_dirty; > - u64 inv = dirty >> 3; > + u64 inv = dirty / TASK_LIMIT_FRACTION; > > task_dirties_fraction(tsk, &numerator, &denominator); > inv *= numerator; > @@ -290,6 +291,12 @@ static unsigned long task_dirty_limit(struct task_struct *tsk, > return max(dirty, bdi_dirty/2); > } > > +/* Minimum limit for any task */ > +static unsigned long task_min_dirty_limit(unsigned long bdi_dirty) > +{ > + return bdi_dirty - bdi_dirty / TASK_LIMIT_FRACTION; > +} > + > /* > * > */ > @@ -483,9 +490,12 @@ static void balance_dirty_pages(struct address_space *mapping, > unsigned long background_thresh; > unsigned long dirty_thresh; > unsigned long bdi_thresh; > + unsigned long task_bdi_thresh; > + unsigned long min_task_bdi_thresh; > unsigned long pages_written = 0; > unsigned long pause = 1; > bool dirty_exceeded = false; > + bool clear_dirty_exceeded = true; > struct backing_dev_info *bdi = mapping->backing_dev_info; > > for (;;) { > @@ -512,7 +522,8 @@ static void balance_dirty_pages(struct address_space *mapping, > break; > > bdi_thresh = bdi_dirty_limit(bdi, dirty_thresh); > - bdi_thresh = task_dirty_limit(current, bdi_thresh); > + min_task_bdi_thresh = task_min_dirty_limit(bdi_thresh); > + task_bdi_thresh = task_dirty_limit(current, bdi_thresh); > > /* > * In order to avoid the stacked BDI deadlock we need > @@ -524,7 +535,7 @@ static void balance_dirty_pages(struct address_space *mapping, > * actually dirty; with m+n sitting in the percpu > * deltas. > */ > - if (bdi_thresh < 2*bdi_stat_error(bdi)) { > + if (task_bdi_thresh < 2 * bdi_stat_error(bdi)) { > bdi_nr_reclaimable = bdi_stat_sum(bdi, BDI_RECLAIMABLE); > bdi_nr_writeback = bdi_stat_sum(bdi, BDI_WRITEBACK); > } else { > @@ -539,8 +550,11 @@ static void balance_dirty_pages(struct address_space *mapping, > * the last resort safeguard. > */ > dirty_exceeded = > - (bdi_nr_reclaimable + bdi_nr_writeback > bdi_thresh) > - || (nr_reclaimable + nr_writeback > dirty_thresh); > + (bdi_nr_reclaimable + bdi_nr_writeback > task_bdi_thresh) > + || (nr_reclaimable + nr_writeback > dirty_thresh); > + clear_dirty_exceeded = > + (bdi_nr_reclaimable + bdi_nr_writeback <= min_task_bdi_thresh) > + && (nr_reclaimable + nr_writeback <= dirty_thresh); > > if (!dirty_exceeded) > break; > @@ -558,7 +572,7 @@ static void balance_dirty_pages(struct address_space *mapping, > * up. > */ > trace_wbc_balance_dirty_start(&wbc, bdi); > - if (bdi_nr_reclaimable > bdi_thresh) { > + if (bdi_nr_reclaimable > task_bdi_thresh) { > writeback_inodes_wb(&bdi->wb, &wbc); > pages_written += write_chunk - wbc.nr_to_write; > trace_wbc_balance_dirty_written(&wbc, bdi); > @@ -578,7 +592,8 @@ static void balance_dirty_pages(struct address_space *mapping, > pause = HZ / 10; > } > > - if (!dirty_exceeded && bdi->dirty_exceeded) > + /* Clear dirty_exceeded flag only when no task can exceed the limit */ > + if (clear_dirty_exceeded && bdi->dirty_exceeded) > bdi->dirty_exceeded = 0; > > if (writeback_in_progress(bdi)) > -- > 1.7.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>