On Sat, 2009-08-22 at 10:51 +0800, Wu Fengguang wrote: > > +++ a/mm/page-writeback.c > > @@ -465,7 +439,6 @@ get_dirty_limits(unsigned long *pbackgro > > bdi_dirty = dirty * bdi->max_ratio / 100; > > > > *pbdi_dirty = bdi_dirty; > > task_dirty_limit(current, pbdi_dirty); > > } > > } > > @@ -499,45 +472,12 @@ static void balance_dirty_pages(struct a > > }; > > > > get_dirty_limits(&background_thresh, &dirty_thresh, > > + &bdi_thresh, bdi); > > > > nr_reclaimable = global_page_state(NR_FILE_DIRTY) + > > + global_page_state(NR_UNSTABLE_NFS); > > + nr_writeback = global_page_state(NR_WRITEBACK) + > > + global_page_state(NR_WRITEBACK_TEMP); > > > > /* > > * In order to avoid the stacked BDI deadlock we need > > @@ -557,16 +497,48 @@ static void balance_dirty_pages(struct a > > bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK); > > } > > > > > + /* always throttle if over threshold */ > > + if (nr_reclaimable + nr_writeback < dirty_thresh) { > > That 'if' is a big behavior change. It effectively blocks every one > and canceled Peter's proportional throttling work: the less a process > dirtied, the less it should be throttled. Hmm, I think you're right, I had not considered that, thanks for catching that. > I'd propose to remove the above 'if' and liberate the following three 'if's. That might work, but it looses the total dirty_thresh constraint. The sum of per-bdi dirties _should_ not be larger than that, but I'm not sure it won't ever be. The clip code Richard removed ensured that, and I think I wrote that out of more than sheer paranoia, but I'm not sure anymore :/ I'll go over the numeric stuff again to see where it could go wrong. If we can bound the error (I'm suspecting it was some numerical error bound) we should be good and can indeed do this. > > + > > + if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh) > > + break; > > + > > + /* > > + * Throttle it only when the background writeback cannot > > + * catch-up. This avoids (excessively) small writeouts > > + * when the bdi limits are ramping up. > > + */ > > + if (nr_reclaimable + nr_writeback < > > + (background_thresh + dirty_thresh) / 2) > > + break; > > + > > + /* done enough? */ > > + if (pages_written >= write_chunk) > > + break; > > + } > > + if (!bdi->dirty_exceeded) > > + bdi->dirty_exceeded = 1; > > > > + /* Note: nr_reclaimable denotes nr_dirty + nr_unstable. > > + * Unstable writes are a feature of certain networked > > + * filesystems (i.e. NFS) in which data may have been > > + * written to the server's write cache, but has not yet > > + * been flushed to permanent storage. > > + * Only move pages to writeback if this bdi is over its > > + * threshold otherwise wait until the disk writes catch > > + * up. > > + */ > > + if (bdi_nr_reclaimable > bdi_thresh) { > > + writeback_inodes(&wbc); > > + pages_written += write_chunk - wbc.nr_to_write; > > > + if (wbc.nr_to_write == 0) > > + continue; > > What's the purpose of the above 2 lines? I think I should slow down, I seem to have totally missed these two lines when I read the patch :/ > > + } > > congestion_wait(BLK_RW_ASYNC, HZ/10); > > } > > > > if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh && > > + bdi->dirty_exceeded) > > bdi->dirty_exceeded = 0; > > > > if (writeback_in_progress(bdi)) > > @@ -580,10 +552,8 @@ static void balance_dirty_pages(struct a > > * In normal mode, we start background writeout at the lower > > * background_thresh, to keep the amount of dirty memory low. > > */ > > + if ((laptop_mode && pages_written) || (!laptop_mode && > > + (nr_reclaimable > background_thresh))) > > bdi_start_writeback(bdi, NULL, 0, WB_SYNC_NONE); > > } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html