Re: [RFC PATCH] nfs: don't skip CTO on v2/3 mounts, regardless of order of reference puts

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Thu, 28 Jul 2022 15:15:32 +0000

On Thu, 2022-07-28 at 10:59 -0400, Jeff Layton wrote:
> On Thu, 2022-07-28 at 14:53 +0000, Trond Myklebust wrote:
> > On Thu, 2022-07-28 at 10:24 -0400, Jeff Layton wrote:
> > > Olga reported a case of data corruption in a situation where two
> > > clients
> > > (v3 and v4) were alternately doing open/write/close the same
> > > file.
> > > 
> > > Looking at captures, the v3 client failed to perform any of the
> > > GETATTR
> > > calls for CTO during one of the events, leading it to overwrite
> > > some
> > > data that had been written by the v4 client.
> > > 
> > > We have two calls that work similarly: put_nfs_open_context and
> > > put_nfs_open_context_sync. The only difference is the is_sync
> > > parameter
> > > that gets passed to close_context. The only caller of the _sync
> > > variant
> > > is in the close codepath.
> > > 
> > > On a v2/3 mount, if the last reference is not put by
> > > put_nfs_open_context_sync, then CTO revalidation never happens.
> > > Fix
> > > this
> > > by adding a new flag to nfs_open_context and set that in
> > > put_nfs_open_context_sync. In nfs_close_context, check for that
> > > flag
> > > when we're checking is_sync and treat them as equivalent.
> > > 
> > > Cc: Scott Mayhew <smayhew@xxxxxxxxxx>
> > > Cc: Ben Coddington <bcodding@xxxxxxxxxx>
> > > Reported-by: Olga Kornieskaia <kolga@xxxxxxxxxx>
> > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
> > > ---
> > >  fs/nfs/inode.c         | 3 ++-
> > >  include/linux/nfs_fs.h | 3 ++-
> > >  2 files changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > I'm not sure this is the right fix. Maybe we'd be better off just
> > > ignoring the is_sync parameter in nfs_close_context? It's
> > > probably
> > > functionally equivalent anyway.
> > > 
> > > Thoughts?
> > > 
> > > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > > index b4e46b0ffa2d..58b506a0a2f2 100644
> > > --- a/fs/nfs/inode.c
> > > +++ b/fs/nfs/inode.c
> > > @@ -1005,7 +1005,7 @@ void nfs_close_context(struct
> > > nfs_open_context
> > > *ctx, int is_sync)
> > >  
> > >         if (!(ctx->mode & FMODE_WRITE))
> > >                 return;
> > > -       if (!is_sync)
> > > +       if (!is_sync && !test_bit(NFS_CONTEXT_DO_CLOSE, &ctx-
> > > >flags))
> > >                 return;
> > >         inode = d_inode(ctx->dentry);
> > >         if (NFS_PROTO(inode)->have_delegation(inode, FMODE_READ))
> > 
> > NACK. If the 'is_sync' flag is not set, then it is because the
> > function
> > is being called from a context where it is not safe to do a
> > synchronous
> > RPC call.
> > 
> 
> Ok. Any thoughts on the right way to fix this then? It seems like
> skipping revalidation because the last put wasn't in the close
> codepath
> is the wrong thing to do. Should we schedule it to a workqueue?

If the attributes are marked as up to date, because they were already
revalidated after the I/O was performed, then we're fine. If they are
not up to date, then that is expected to trigger a cache invalidation
on the next open when we compare the client's cached values to the
server-retrieved values for the file attributes.

i.e. this ought to be just a performance problem, and not a correctness
problem, if attribute cache revalidation is working as designed.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx