Re: [PATCH 01/45] writeback: reduce calls to global_page_state in balance_dirty_pages()

Wu Fengguang <fengguang.wu@xxxxxxxxx> · Sun, 11 Oct 2009 18:50:12 +0800

On Sun, Oct 11, 2009 at 03:44:40PM +0800, Peter Zijlstra wrote:
> On Sun, 2009-10-11 at 10:28 +0800, Wu Fengguang wrote:
> > 
> > Note that the total limit check itself may not be sufficient. For
> > example, there are no nr_writeback limit for NFS (and maybe btrfs)
> > after removing the congestion waits.  Therefore it is very possible
> > 
> >         nr_writeback => dirty_thresh
> >         nr_dirty     => 0
> > 
> > which is obviously undesirable: everything newly dirtied are soon put
> > to writeback. It violates the 30s expire time and the background
> > threshold rules, and will hurt write-and-truncate operations (ie. temp
> > files).
> > 
> > So the better solution would be to impose a nr_writeback limit for
> > every filesystem that didn't already have one (the block io queue).
> > NFS used to have that limit with congestion_wait, but now we need
> > to do a wait queue for it.
> > 
> > With the nr_writeback wait queue, it can be guaranteed that once
> > balance_dirty_pages() asks for writing 1500 pages, it will be done
> > with necessary sleeping in the bdi flush thread. So we can safely
> > remove the loop and double checking of global dirty limit in
> > balance_dirty_pages().
> 
> nr_reclaim = nr_dirty + nr_writeback + nr_unstable, so anything calling
> into balance_dirty_pages() would still block on seeing such large
> amounts of nr_writeback.

Our terms are a bit different. In my previous mail,
        nr_reclaim = nr_dirty + nr_unstable
nr_writeback is added separated when comparing with dirty_thresh, just
as the code in balance_dirty_pages().

But that's fine. You are right that the application will be blocked
and dirty limit be guaranteed, if we do
        while (over dirty limit) {
                bdi_writeback_wait(pages to write);
        }

But it has a problem: as long as the bdi-flush thread for NFS don't
limit nr_writeback, its nr_writeback will grow to near
(dirty_thresh-nr_unstable), and its nr_dirty will approach 0.
That's not desirable.

So I did this:
-       while (over dirty limit) {
+       if (over dirty limit) {
                bdi_writeback_wait(pages to write);
        }
_after_ adding the NFS nr_writeback wait queue ([PATCH 20/45] NFS:
introduce writeback wait queue). With that it's safe to remove the
loop.

> Having the constraint nr_dirty + nr_writeback + nr_unstable <
> dirty_thresh should ensure we never have nr_writeback > dirty_thresh,
> simply because you cannot dirty more, which then cannot be converted to
> more writeback.
> 
> Or am I missing something?

You are right with the assumption that the loop is still there.

Sorry for the confusion, but I mean, filesystems have to limit
nr_writeback (directly or indirectly via the block io queue),
otherwise it either hit nr_dirty to 0 (with the loop), or let
nr_writeback grow out of control (without the loop).

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html