On Sun, Oct 11, 2009 at 03:44:40PM +0800, Peter Zijlstra wrote: > On Sun, 2009-10-11 at 10:28 +0800, Wu Fengguang wrote: > > > > Note that the total limit check itself may not be sufficient. For > > example, there are no nr_writeback limit for NFS (and maybe btrfs) > > after removing the congestion waits. Therefore it is very possible > > > > nr_writeback => dirty_thresh > > nr_dirty => 0 > > > > which is obviously undesirable: everything newly dirtied are soon put > > to writeback. It violates the 30s expire time and the background > > threshold rules, and will hurt write-and-truncate operations (ie. temp > > files). > > > > So the better solution would be to impose a nr_writeback limit for > > every filesystem that didn't already have one (the block io queue). > > NFS used to have that limit with congestion_wait, but now we need > > to do a wait queue for it. > > > > With the nr_writeback wait queue, it can be guaranteed that once > > balance_dirty_pages() asks for writing 1500 pages, it will be done > > with necessary sleeping in the bdi flush thread. So we can safely > > remove the loop and double checking of global dirty limit in > > balance_dirty_pages(). > > nr_reclaim = nr_dirty + nr_writeback + nr_unstable, so anything calling > into balance_dirty_pages() would still block on seeing such large > amounts of nr_writeback. Our terms are a bit different. In my previous mail, nr_reclaim = nr_dirty + nr_unstable nr_writeback is added separated when comparing with dirty_thresh, just as the code in balance_dirty_pages(). But that's fine. You are right that the application will be blocked and dirty limit be guaranteed, if we do while (over dirty limit) { bdi_writeback_wait(pages to write); } But it has a problem: as long as the bdi-flush thread for NFS don't limit nr_writeback, its nr_writeback will grow to near (dirty_thresh-nr_unstable), and its nr_dirty will approach 0. That's not desirable. So I did this: - while (over dirty limit) { + if (over dirty limit) { bdi_writeback_wait(pages to write); } _after_ adding the NFS nr_writeback wait queue ([PATCH 20/45] NFS: introduce writeback wait queue). With that it's safe to remove the loop. > Having the constraint nr_dirty + nr_writeback + nr_unstable < > dirty_thresh should ensure we never have nr_writeback > dirty_thresh, > simply because you cannot dirty more, which then cannot be converted to > more writeback. > > Or am I missing something? You are right with the assumption that the loop is still there. Sorry for the confusion, but I mean, filesystems have to limit nr_writeback (directly or indirectly via the block io queue), otherwise it either hit nr_dirty to 0 (with the loop), or let nr_writeback grow out of control (without the loop). Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html