On Tue, 27 Apr 2010 18:35:56 -0400 Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> wrote: > On Tue, 2010-04-27 at 18:21 -0400, Trond Myklebust wrote: > > On Tue, 2010-04-27 at 08:00 -0400, Trond Myklebust wrote: > > > On Tue, 2010-04-27 at 14:35 +1000, Neil Brown wrote: > > > > Hi Trond, > > > > I think the above mentioned commit might have added a new race to replace > > > > the old .... > > > > > > > > I have report of a BUG in nfs_page_async_flush. > > > > > > > > It isn't a vanilla upstream kernel - there are a bunch of SUSE patches > > > > in there - so quoting the line-number won't help you, but it is the > > > > BUG_ON(ret != 0); > > > > after the call to nfs_set_page_writeback. > > > > (https://bugzilla.novell.com/show_bug.cgi?id=599628) > > > > > > > > This implies that nfs_find_and_lock_request got a new lock on the page, > > > > and then we found that it was already flagged for writeback. > > > > > > That's odd. Callers such as write_cache_pages() should normally be doing > > > a wait_on_page_writeback() after taking the page lock but prior to > > > calling the filesystem. > > > > The following patch ought to fix it. I suspect the same race exists in > > the ->readpage() path, so it makes sense to fix nfs_wb_page() rather > > than putting the wait_on_page_writeback call in > > nfs_try_to_update_request(). > > Actually, this patch is even better since it cleans up nfs_wb_page() > too. Thanks Trond! I won't pretend to completely understand it, but it certainly looks credible and removes some code, which is always nice! I don't think the problem was easily reproducible so I cannot easily test if this fixes it, so I'll just assume it does and let you know if I hear otherwise. Thanks, NeilBrown > > Cheers > Trond > ------------------------------------------------------------------------------------------ > NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear > > From: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> > > Neil Brown reports that he is seeing the BUG_ON(ret == 0) trigger in > nfs_page_async_flush. According to the trace in > https://bugzilla.novell.com/show_bug.cgi?id=599628 > the problem appears to be due to nfs_wb_page() not waiting for the > PG_writeback flag to clear. > > There is a ditto problem in nfs_wb_page_cancel() > > Signed-off-by: Trond Myklebust <Trond.Myklebust@xxxxxxxxxx> > --- > > fs/nfs/write.c | 19 ++++--------------- > 1 files changed, 4 insertions(+), 15 deletions(-) > > > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > index ccde2ae..3aea3ca 100644 > --- a/fs/nfs/write.c > +++ b/fs/nfs/write.c > @@ -1472,6 +1472,7 @@ int nfs_wb_page_cancel(struct inode *inode, struct page *page) > > BUG_ON(!PageLocked(page)); > for (;;) { > + wait_on_page_writeback(page); > req = nfs_page_find_request(page); > if (req == NULL) > break; > @@ -1506,30 +1507,18 @@ int nfs_wb_page(struct inode *inode, struct page *page) > .range_start = range_start, > .range_end = range_end, > }; > - struct nfs_page *req; > - int need_commit; > int ret; > > while(PagePrivate(page)) { > + wait_on_page_writeback(page); > if (clear_page_dirty_for_io(page)) { > ret = nfs_writepage_locked(page, &wbc); > if (ret < 0) > goto out_error; > } > - req = nfs_find_and_lock_request(page); > - if (!req) > - break; > - if (IS_ERR(req)) { > - ret = PTR_ERR(req); > + ret = sync_inode(inode, &wbc); > + if (ret < 0) > goto out_error; > - } > - need_commit = test_bit(PG_CLEAN, &req->wb_flags); > - nfs_clear_page_tag_locked(req); > - if (need_commit) { > - ret = nfs_commit_inode(inode, FLUSH_SYNC); > - if (ret < 0) > - goto out_error; > - } > } > return 0; > out_error: > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html