On Fri, Oct 09, 2009 at 11:47:59PM +0800, Jan Kara wrote: > On Fri 09-10-09 17:18:32, Peter Zijlstra wrote: > > On Fri, 2009-10-09 at 17:12 +0200, Jan Kara wrote: > > > Ugh, but this is not equivalent! We would block the writer on some BDI > > > without any dirty data if we are over global dirty limit. That didn't > > > happen before. > > > > It should have, we should throttle everything calling > > balance_dirty_pages() when we're over the total limit. > OK :) I agree it's reasonable. But Wu, please note this in the > changelog because it might be a substantial change for some loads. Thanks, I added the note by Peter :) Note that the total limit check itself may not be sufficient. For example, there are no nr_writeback limit for NFS (and maybe btrfs) after removing the congestion waits. Therefore it is very possible nr_writeback => dirty_thresh nr_dirty => 0 which is obviously undesirable: everything newly dirtied are soon put to writeback. It violates the 30s expire time and the background threshold rules, and will hurt write-and-truncate operations (ie. temp files). So the better solution would be to impose a nr_writeback limit for every filesystem that didn't already have one (the block io queue). NFS used to have that limit with congestion_wait, but now we need to do a wait queue for it. With the nr_writeback wait queue, it can be guaranteed that once balance_dirty_pages() asks for writing 1500 pages, it will be done with necessary sleeping in the bdi flush thread. So we can safely remove the loop and double checking of global dirty limit in balance_dirty_pages(). However, there is still one problem - there are no general coordinations between max nr_writeback and the background/dirty limits. It is possible (and very likely for some small memory systems) that nr_writeback > dirty_thresh - background_thresh 10,000 20,000 15,000 In this case, it is possible that an application to be throttled because of nr_reclaimable + nr_writeback > dirty_thresh 12,000 10,000 20,000 starts a background writeback work to do job for it, however that work quits immediately because nr_reclaimable < background_thresh 12,000 15,000 In the end, the application did not get throttled at all at dirty_thresh. Instead, it will be throttled at (background_thresh + max_nr_writeback). One solution (aka. the old behavior) is to respect the dirty_thresh, by not quiting background writeback when there are throttled tasks (this patch). It has the drawback of background writeback not doing its job _actively_. Instead, it will frequently be started and quit at times when applications enter and leave balanced_dirty_pages(). In the above scheme, the background_thresh is disregarded. The other ways would be to disregard dirty_thresh (may be undesirable) or to limit max_nr_writeback (not as easy). It is still very possible to hit nr_dirty all the way down to 0 if max_nr_writeback > background_thresh. This is a bit twisting. Any ideas? Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- fs/fs-writeback.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- linux.orig/fs/fs-writeback.c 2009-10-11 09:19:49.000000000 +0800 +++ linux/fs/fs-writeback.c 2009-10-11 09:21:50.000000000 +0800 @@ -781,7 +781,8 @@ static long wb_writeback(struct bdi_writ * For background writeout, stop when we are below the * background dirty threshold */ - if (args->for_background && !over_bground_thresh()) + if (args->for_background && !over_bground_thresh() && + !list_empty(&wb->bdi->throttle_list)) break; wbc.more_io = 0; -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html