Re: [PATCH 1/6] VFS: Ensure that writeback_single_inode() commits unstable writes

Wu Fengguang <fengguang.wu@xxxxxxxxx> · Thu, 7 Jan 2010 12:48:17 +0800

On Thu, Jan 07, 2010 at 12:38:02PM +0800, Myklebust, Trond wrote:
> On Thu, 2010-01-07 at 10:18 +0800, Wu Fengguang wrote:
> > On Thu, Jan 07, 2010 at 04:51:10AM +0800, Trond Myklebust wrote:
> > > @@ -533,6 +556,12 @@ select_queue:
> > >                             inode->i_state |= I_DIRTY_PAGES;
> > >                             redirty_tail(inode);
> > >                     }
> > > +           } else if (inode->i_state & I_UNSTABLE_PAGES) {
> > > +                   /*
> > > +                    * The inode has got yet more unstable pages to
> > > +                    * commit. Requeue on b_more_io
> > > +                    */
> > > +                   requeue_io(inode);
> >
> > This risks "busy retrying" inodes with unstable pages, when
> >
> > - nfs_commit_unstable_pages() don't think it's time to commit
> > - NFS server somehow response slowly
> >
> > The workaround is to use redirty_tail() for now. But that risks delay
> > the COMMIT for up to 30s, which obviously might stuck applications in
> > balance_dirty_pages() for too long.
> >
> > I have a patch to shorten the retry time to 1s (or other constant)
> > by introducing b_more_io_wait. It currently sits in my writeback queue
> > series whose main blocking issue is the constantly broken NFS pipeline..
> 
> 
> OK. Should I use redirty_tail() for the moment then, and assume you will
> fix when you introduce the new state?

OK, I'll change your redirty_tail() then :)

> > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > > index d171696..910be28 100644
> > > --- a/fs/nfs/write.c
> > > +++ b/fs/nfs/write.c
> > > @@ -441,7 +441,7 @@ nfs_mark_request_commit(struct nfs_page *req)
> > >     spin_unlock(&inode->i_lock);
> > >     inc_zone_page_state(req->wb_page, NR_UNSTABLE_NFS);
> > >     inc_bdi_stat(req->wb_page->mapping->backing_dev_info, BDI_RECLAIMABLE);
> > > -   __mark_inode_dirty(inode, I_DIRTY_DATASYNC);
> > > +   mark_inode_unstable_pages(inode);
> >
> > Then we shall mark I_DIRTY_DATASYNC on other places that extend i_size.
> 
> Why? The NFS client itself shouldn't ever set I_DIRTY_DATASYNC after
> this patch is applied. We won't ever need it.
> 
> If the VM or VFS is doing it, then they ought to be fixed: there is no
> reason to assume that all filesystems need to sync their inodes on
> i_size changes.

Ah OK, I took it for certain..

> > > +   /* Don't commit yet if this is a non-blocking flush and there are
> > > +    * outstanding writes for this mapping.
> > > +    */
> > > +   if (wbc->sync_mode != WB_SYNC_ALL &&
> > > +       radix_tree_tagged(&NFS_I(inode)->nfs_page_tree,
> > > +               NFS_PAGE_TAG_LOCKED)) {
> > > +           mark_inode_unstable_pages(inode);
> > > +           return 0;
> > > +   }
> >
> > A dumb question: does NFS_PAGE_TAG_LOCKED means either flying COMMITs
> > or WRITEs? As an NFS newbie, I'm only confident on the COMMIT part :)
> 
> Both writebacks and commits will cause NFS_PAGE_TAG_LOCKED to be set, as
> will attempts to change the page contents. See the calls to
> nfs_set_page_tag_locked()...

Thanks for the tip!

> IOW: the above code will fail to trigger if there are outstanding WRITE
> RPC calls, or if there is a second process that happens to be writing to
> this inode's page cache...

IOW: for a busy "cp", the commit of inode may be delayed until
"nr_unstable > nr_dirty / 2"?  OK that's what we want.

Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html