Re: [PATCH 4/4] NFS: Fix fscache read from NFS after cache error

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Tue, 29 Jun 2021 14:54:07 +0000

On Tue, 2021-06-29 at 09:20 -0400, David Wysochanski wrote:
> On Tue, Jun 29, 2021 at 8:46 AM Trond Myklebust
> <trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Tue, 2021-06-29 at 05:17 -0400, David Wysochanski wrote:
> > > On Mon, Jun 28, 2021 at 8:39 PM Trond Myklebust
> > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > On Mon, 2021-06-28 at 19:46 -0400, David Wysochanski wrote:
> > > > > On Mon, Jun 28, 2021 at 5:59 PM Trond Myklebust
> > > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > > 
> > > > > > On Mon, 2021-06-28 at 17:12 -0400, David Wysochanski wrote:
> > > > > > > On Mon, Jun 28, 2021 at 3:09 PM Trond Myklebust
> > > > > > > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > > > > > > > 
> > > > > > > > On Mon, 2021-06-28 at 13:39 -0400, Dave Wysochanski
> > > > > > > > wrote:
> > > > > > > > > Earlier commits refactored some NFS read code and
> > > > > > > > > removed
> > > > > > > > > nfs_readpage_async(), but neglected to properly fixup
> > > > > > > > > nfs_readpage_from_fscache_complete().  The code path
> > > > > > > > > is
> > > > > > > > > only hit when something unusual occurs with the
> > > > > > > > > cachefiles
> > > > > > > > > backing filesystem, such as an IO error or while a
> > > > > > > > > cookie
> > > > > > > > > is being invalidated.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@xxxxxxxxxx>
> > > > > > > > > ---
> > > > > > > > >  fs/nfs/fscache.c | 14 ++++++++++++--
> > > > > > > > >  1 file changed, 12 insertions(+), 2 deletions(-)
> > > > > > > > > 
> > > > > > > > > diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c
> > > > > > > > > index c4c021c6ebbd..d308cb7e1dd4 100644
> > > > > > > > > --- a/fs/nfs/fscache.c
> > > > > > > > > +++ b/fs/nfs/fscache.c
> > > > > > > > > @@ -381,15 +381,25 @@ static void
> > > > > > > > > nfs_readpage_from_fscache_complete(struct page *page,
> > > > > > > > >                                                void
> > > > > > > > > *context,
> > > > > > > > >                                                int
> > > > > > > > > error)
> > > > > > > > >  {
> > > > > > > > > +       struct nfs_readdesc desc;
> > > > > > > > > +       struct inode *inode = page->mapping->host;
> > > > > > > > > +
> > > > > > > > >         dfprintk(FSCACHE,
> > > > > > > > >                  "NFS: readpage_from_fscache_complete
> > > > > > > > > (0x%p/0x%p/%d)\n",
> > > > > > > > >                  page, context, error);
> > > > > > > > > 
> > > > > > > > > -       /* if the read completes with an error, we
> > > > > > > > > just
> > > > > > > > > unlock
> > > > > > > > > the
> > > > > > > > > page and let
> > > > > > > > > -        * the VM reissue the readpage */
> > > > > > > > >         if (!error) {
> > > > > > > > >                 SetPageUptodate(page);
> > > > > > > > >                 unlock_page(page);
> > > > > > > > > +       } else {
> > > > > > > > > +               desc.ctx = context;
> > > > > > > > > +               nfs_pageio_init_read(&desc.pgio,
> > > > > > > > > inode,
> > > > > > > > > false,
> > > > > > > > > +
> > > > > > > > > &nfs_async_read_completion_ops);
> > > > > > > > > +               error = readpage_async_filler(&desc,
> > > > > > > > > page);
> > > > > > > > > +               if (error)
> > > > > > > > > +                       return;
> > > > > > > > 
> > > > > > > > This code path can clearly fail too. Why can we not fix
> > > > > > > > this
> > > > > > > > code
> > > > > > > > to
> > > > > > > > allow it to return that reported error so that we can
> > > > > > > > handle
> > > > > > > > the
> > > > > > > > failure case in nfs_readpage() instead of dead-ending
> > > > > > > > here?
> > > > > > > > 
> > > > > > > 
> > > > > > > Maybe the below patch is what you had in mind?  That way
> > > > > > > if
> > > > > > > fscache
> > > > > > > is enabled, nfs_readpage() should behave the same way as
> > > > > > > if
> > > > > > > it's
> > > > > > > not,
> > > > > > > for the case where an IO error occurs in the NFS read
> > > > > > > completion
> > > > > > > path.
> > > > > > > 
> > > > > > > If we call into fscache and we get back that the IO has
> > > > > > > been
> > > > > > > submitted,
> > > > > > > wait until it is completed, so we'll catch any IO errors
> > > > > > > in
> > > > > > > the
> > > > > > > read
> > > > > > > completion
> > > > > > > path.  This does not solve the "catch the internal
> > > > > > > errors",
> > > > > > > IOW,
> > > > > > > the
> > > > > > > ones that show up as pg_error, that will probably require
> > > > > > > copying
> > > > > > > pg_error into nfs_open_context.error field.
> > > > > > > 
> > > > > > > diff --git a/fs/nfs/read.c b/fs/nfs/read.c
> > > > > > > index 78b9181e94ba..28e3318080e0 100644
> > > > > > > --- a/fs/nfs/read.c
> > > > > > > +++ b/fs/nfs/read.c
> > > > > > > @@ -357,13 +357,13 @@ int nfs_readpage(struct file *file,
> > > > > > > struct
> > > > > > > page
> > > > > > > *page)
> > > > > > >         } else
> > > > > > >                 desc.ctx =
> > > > > > > get_nfs_open_context(nfs_file_open_context(file));
> > > > > > > 
> > > > > > > +       xchg(&desc.ctx->error, 0);
> > > > > > >         if (!IS_SYNC(inode)) {
> > > > > > >                 ret = nfs_readpage_from_fscache(desc.ctx,
> > > > > > > inode,
> > > > > > > page);
> > > > > > >                 if (ret == 0)
> > > > > > > -                       goto out;
> > > > > > > +                       goto out_wait;
> > > > > > >         }
> > > > > > > 
> > > > > > > -       xchg(&desc.ctx->error, 0);
> > > > > > >         nfs_pageio_init_read(&desc.pgio, inode, false,
> > > > > > >                             
> > > > > > > &nfs_async_read_completion_ops);
> > > > > > > 
> > > > > > > @@ -373,6 +373,7 @@ int nfs_readpage(struct file *file,
> > > > > > > struct
> > > > > > > page
> > > > > > > *page)
> > > > > > > 
> > > > > > >         nfs_pageio_complete_read(&desc.pgio);
> > > > > > >         ret = desc.pgio.pg_error < 0 ? desc.pgio.pg_error
> > > > > > > :
> > > > > > > 0;
> > > > > > > +out_wait:
> > > > > > >         if (!ret) {
> > > > > > >                 ret = wait_on_page_locked_killable(page);
> > > > > > >                 if (!PageUptodate(page) && !ret)
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > > +
> > > > > > > > > +               nfs_pageio_complete_read(&desc.pgio);
> > > > > > > > >         }
> > > > > > > > >  }
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > --
> > > > > > > > Trond Myklebust
> > > > > > > > Linux NFS client maintainer, Hammerspace
> > > > > > > > trond.myklebust@xxxxxxxxxxxxxxx
> > > > > > > > 
> > > > > > > > 
> > > > > > > 
> > > > > > 
> > > > > > Yes, please. This avoids that duplication of NFS read code
> > > > > > in
> > > > > > the
> > > > > > fscache layer.
> > > > > > 
> > > > > 
> > > > > If you mean patch 4 we still need that - I don't see anyway
> > > > > to
> > > > > avoid it.  The above just will make the fscache enabled
> > > > > path waits for the IO to complete, same as the non-fscache
> > > > > case.
> > > > > 
> > > > 
> > > > With the above, you can simplify patch 4/4 to just make the
> > > > page
> > > > unlock
> > > > unconditional on the error, no?
> > > > 
> > > > i.e.
> > > >         if (!error)
> > > >                 SetPageUptodate(page);
> > > >         unlock_page(page);
> > > > 
> > > > End result: the client just does the same check as before and
> > > > let's
> > > > the
> > > > vfs/mm decide based on the status of the PG_uptodate flag what
> > > > to
> > > > do
> > > > next. I'm assuming that a retry won't cause fscache to do
> > > > another
> > > > bio
> > > > attempt?
> > > > 
> > > 
> > > Yes I think you're right and I'm following - let me test it and
> > > I'll
> > > send a v2.
> > > Then we can drop patch #3 right?
> > > 
> > Sounds good. Thanks Dave!
> > 
> 
> This approach works but it differs from the original when an fscache
> error occurs.
> The original (see below) would call back into NFS to read from the
> server, but
> now we just let the VM handle it.  The VM will re-issue the read, but
> will go back into
> fscache again (because it's enabled), which may fail again.

How about marking the page on failure, then? I don't believe we
currently use PG_owner_priv_1 (a.k.a. PageOwnerPriv1, PageChecked,
PagePinned, PageForeign, PageSwapCache, PageXenRemapped) for anything
and according to legend it is supposed to be usable by the fs for page
cache pages.

So what say we use SetPageChecked() to mark the page as having failed
retrieval from fscache?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx