Re: [PATCH] NFS: Handle missing attributes in OPEN reply

Trond Myklebust <trondmy@xxxxxxxxxx> · Tue, 03 Jan 2023 21:17:35 -0500

On Tue, 2023-01-03 at 21:02 -0500, Olga Kornievskaia wrote:
> On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@xxxxxxxxxx>
> wrote:
> > 
> > On Wed, 2023-01-04 at 11:27 +1100, NeilBrown wrote:
> > > 
> > > If a NFSv4 OPEN reply reports that the file was successfully
> > > opened
> > > but
> > > the subsequent GETATTR fails, Linux-NFS will attempt a stand-
> > > alone
> > > GETATTR request.  If that also fails, handling of the reply is
> > > aborted
> > > with error -EAGAIN and the open is attempted again from the
> > > start.
> > > 
> > > This leaves the server with an active state (because the OPEN
> > > operation
> > > succeeded) which the client doesn't know about.  If the open-
> > > owner
> > > (local user) did not have the file already open, this has minimal
> > > consequences for the client and only causes the server to spend
> > > resources on an open state that will never be used or explicitly
> > > closed.
> > > 
> > > If the open-owner DID already have the file open, then it will
> > > hold a
> > > reference to the open-state for that file, but the seq-id in the
> > > state-id will now be out-of-sync with the server.  The server
> > > will
> > > have
> > > incremented the seq-id, but the client will not have noticed.  So
> > > when
> > > the client next attempts to access the file using that state
> > > (READ,
> > > WRITE, SETATTR), the attempt will fail with NFS4ERR_OLD_STATEID.
> > > 
> > > The Linux-client assumes this error is due to a race and simply
> > > retries
> > > on the basis that the local state-id information should have been
> > > updated by another thread.  This basis is invalid in this case
> > > and
> > > the
> > > result is an infinite loop attempting IO and getting OLD_STATEID.
> > > 
> > > This has been observed with a NetApp Filer as the server (ONTAP
> > > 9.8
> > > p5,
> > > using NFSv4.0).  The client is creating, writing, and unlinking a
> > > particular file from multiple clients (.bash_history).  If a new
> > > OPEN
> > > from one client races with a REMOVE from another client while the
> > > first
> > > client already has the file open, the Filer can report success
> > > for
> > > the
> > > OPEN op, but NFS4ERR_STALE for the ACCESS or GETATTR ops in the
> > > OPEN
> > > request.  This gets the seq-id out-of-sync and a subsequent write
> > > to
> > > the
> > > other open on the first client causes the infinite loop to occur.
> > > 
> > > The reason that the client returns -EAGAIN is that it needs to
> > > find
> > > the
> > > inode so it can find the associated state to update the seq-id,
> > > but
> > > the
> > > inode lookup requires the file-id which is provided in the
> > > GETATTR
> > > reply.  Without the file-id normal inode lookup cannot be used.
> > > 
> > > This patch changes the lookup so that when the file-id is not
> > > available
> > > the list of states owned by the open-owner is examined to find
> > > the
> > > state
> > > with the correct state-id (ignoring the seq-id part of the state-
> > > id).
> > > If this is found it is used just as when a normal inode lookup
> > > finds
> > > an
> > > inode.  If it isn't found, -EAGAIN is returned as before.
> > > 
> > > This bug can be demonstrated by modifying the Linux NFS server as
> > > follows:
> > > 
> > > 1/ The second time a file is opened, unlink it.  This simulates
> > >    a race with another client, without needing to have a race:
> > > 
> > >     --- a/fs/nfsd/nfs4proc.c
> > >     +++ b/fs/nfsd/nfs4proc.c
> > >     @@ -594,6 +594,12 @@ nfsd4_open(struct svc_rqst *rqstp,
> > > struct
> > > nfsd4_compound_state *cstate,
> > >         if (reclaim && !status)
> > >                 nn->somebody_reclaimed = true;
> > >      out:
> > >     +   if (!status && open->op_stateid.si_generation > 1) {
> > >     +           printk("Opening gen %d\n", (int)open-
> > > > op_stateid.si_generation);
> > >     +           vfs_unlink(mnt_user_ns(resfh->fh_export-
> > > > ex_path.mnt),
> > >     +                      resfh->fh_dentry->d_parent->d_inode,
> > >     +                      resfh->fh_dentry, NULL);
> > >     +   }
> > >         if (open->op_filp) {
> > >                 fput(open->op_filp);
> > >                 open->op_filp = NULL;
> > > 
> > > 2/ When a GETATTR op is attempted on an unlinked file, return
> > > ESTALE
> > > 
> > >     @@ -852,6 +858,11 @@ nfsd4_getattr(struct svc_rqst *rqstp,
> > > struct
> > > nfsd4_compound_state *cstate,
> > >         if (status)
> > >                 return status;
> > > 
> > >     +   if (cstate->current_fh.fh_dentry->d_inode->i_nlink == 0)
> > > {
> > >     +           printk("Return Estale for unlinked file\n");
> > >     +           return nfserr_stale;
> > >     +   }
> > >     +
> > >         if (getattr->ga_bmval[1] & NFSD_WRITEONLY_ATTRS_WORD1)
> > >                 return nfserr_inval;
> > > 
> > > Then mount the filesystem and
> > > 
> > >   Thread 1            Thread 2
> > >   open a file
> > >                       open the same file (will fail)
> > >   write to that file
> > > 
> > > I use this shell fragment, using 'sleep' for synchronisation.
> > > The use of /bin/echo ensures the write is flushed when /bin/echo
> > > closes
> > > the fd on exit.
> > > 
> > >     (
> > >         /bin/echo hello
> > >         sleep 3
> > >         /bin/echo there
> > >     ) > /import/A/afile &
> > >     sleep 3
> > >     cat /import/A/afile
> > > 
> > > Probably when the OPEN succeeds, the GETATTR fails, and we don't
> > > already
> > > have the state open, we should explicitly close the state. 
> > > Leaving
> > > it
> > > open could cause problems if, for example, the server revoked it
> > > and
> > > signalled the client that there was a revoked state.  The client
> > > would
> > > not be able to find the state that needed to be relinquished.  I
> > > haven't
> > > attempted to implement this.
> > 
> > 
> > If the server starts to reply NFS4ERR_STALE to GETATTR requests,
> > why do
> > we care about stateid values?
> 
> It is acceptable for the server to return ESTALE to the GETATTR after
> the processing the open (due to a REMOVE that comes in) and that open
> generating a valid stateid which client should care about when there
> are pre-existing opens. The server will keep the state of an existing
> opens valid even if the file is removed. Which is what's happening,
> the previous open is being used for IO but the stateid is updated on
> the server but not on the client.
> 
> > Just mark the inode as stale and drop it
> > on the floor.
> 
> Why would that be correct? Any pre-existing opens should continue
> operating, thus the inode can't be marked stale. We don't do it now
> (silly rename allows preexisting IO to continue).
> 
> > If the server tries to declare the state as revoked, then it is
> > clearly
> > borken, so let NetApp fix their own bug.
> 
> The server does not declare the state revoked. The open succeeded and
> its state's seqid was updated but the client is using an old stateid.
> Server isn't at fault here.

If the client can't send a GETATTR, then it can't revalidate
attributes, do I/O, and it can't even close the file. So we're not
going to waste time and effort trying to support this, whether or not
NetApp thinks it is a good idea.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx