On Sun, Aug 23, 2009 at 02:11:41AM +0800, Peter Zijlstra wrote: > On Sat, 2009-08-22 at 10:51 +0800, Wu Fengguang wrote: > > > > +++ a/mm/page-writeback.c > > > > @@ -465,7 +439,6 @@ get_dirty_limits(unsigned long *pbackgro > > > bdi_dirty = dirty * bdi->max_ratio / 100; > > > > > > *pbdi_dirty = bdi_dirty; > > > task_dirty_limit(current, pbdi_dirty); > > > } > > > } > > > @@ -499,45 +472,12 @@ static void balance_dirty_pages(struct a > > > }; > > > > > > get_dirty_limits(&background_thresh, &dirty_thresh, > > > + &bdi_thresh, bdi); > > > > > > nr_reclaimable = global_page_state(NR_FILE_DIRTY) + > > > + global_page_state(NR_UNSTABLE_NFS); > > > + nr_writeback = global_page_state(NR_WRITEBACK) + > > > + global_page_state(NR_WRITEBACK_TEMP); > > > > > > /* > > > * In order to avoid the stacked BDI deadlock we need > > > @@ -557,16 +497,48 @@ static void balance_dirty_pages(struct a > > > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > > > } > > > > > > > > + /* always throttle if over threshold */ > > > + if (nr_reclaimable + nr_writeback < dirty_thresh) { > > > > That 'if' is a big behavior change. It effectively blocks every one > > and canceled Peter's proportional throttling work: the less a process > > dirtied, the less it should be throttled. > > Hmm, I think you're right, I had not considered that, thanks for > catching that. Thank you :) > > I'd propose to remove the above 'if' and liberate the following three 'if's. > > That might work, but it looses the total dirty_thresh constraint. The > sum of per-bdi dirties _should_ not be larger than that, but I'm not > sure it won't ever be. > > The clip code Richard removed ensured that, and I think I wrote that out > of more than sheer paranoia, but I'm not sure anymore :/ Oh I assumed that your per-bdi throttling is not too permissive to exceed the global dirty_thresh. In theory the per-bdi throttling should be able to quickly stop the growing of (nr_reclaimable + nr_writeback). Once dirty_thresh is reached we already lose. > I'll go over the numeric stuff again to see where it could go wrong. > > If we can bound the error (I'm suspecting it was some numerical error > bound) we should be good and can indeed do this. Yes, that error bound should be smaller than (dirty_thresh - background_thresh) / 2, unless the user set the two thresholds insanely close (for that we may add some sanity checks in dirty_bytes_handler() and dirty_background_bytes_handler() etc.). Anyway we may do some thing like this for now? if (dirty_thresh exceeded) { WARN_ONCE block write more } Thanks, Fengguang > > > + > > > + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > > > + break; > > > + > > > + /* > > > + * Throttle it only when the background writeback cannot > > > + * catch-up. This avoids (excessively) small writeouts > > > + * when the bdi limits are ramping up. > > > + */ > > > + if (nr_reclaimable + nr_writeback < > > > + (background_thresh + dirty_thresh) / 2) > > > + break; > > > + > > > + /* done enough? */ > > > + if (pages_written >= write_chunk) > > > + break; > > > + } > > > + if (!bdi->dirty_exceeded) > > > + bdi->dirty_exceeded = 1; > > > > > > + /* Note: nr_reclaimable denotes nr_dirty + nr_unstable. > > > + * Unstable writes are a feature of certain networked > > > + * filesystems (i.e. NFS) in which data may have been > > > + * written to the server's write cache, but has not yet > > > + * been flushed to permanent storage. > > > + * Only move pages to writeback if this bdi is over its > > > + * threshold otherwise wait until the disk writes catch > > > + * up. > > > + */ > > > + if (bdi_nr_reclaimable > bdi_thresh) { > > > + writeback_inodes(&wbc); > > > + pages_written += write_chunk - wbc.nr_to_write; > > > > > + if (wbc.nr_to_write == 0) > > > + continue; > > > > What's the purpose of the above 2 lines? > > I think I should slow down, I seem to have totally missed these two > lines when I read the patch :/ > > > > + } > > > congestion_wait(BLK_RW_ASYNC, HZ/10); > > > } > > > > > > if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh && > > > + bdi->dirty_exceeded) > > > bdi->dirty_exceeded = 0; > > > > > > if (writeback_in_progress(bdi)) > > > @@ -580,10 +552,8 @@ static void balance_dirty_pages(struct a > > > * In normal mode, we start background writeout at the lower > > > * background_thresh, to keep the amount of dirty memory low. > > > */ > > > + if ((laptop_mode && pages_written) || (!laptop_mode && > > > + (nr_reclaimable > background_thresh))) > > > bdi_start_writeback(bdi, NULL, 0, WB_SYNC_NONE); > > > } > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html