Re: [PATCH] NFS: Handle missing attributes in OPEN reply

"NeilBrown" <neilb@xxxxxxx> · Fri, 06 Jan 2023 14:48:46 +1100

On Fri, 06 Jan 2023, Trond Myklebust wrote:
> On Fri, 2023-01-06 at 09:56 +1100, NeilBrown wrote:
> > On Wed, 04 Jan 2023, NeilBrown wrote:
> > > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > > On Wed, 2023-01-04 at 12:01 +1100, NeilBrown wrote:
> > > > > On Wed, 04 Jan 2023, Trond Myklebust wrote:
> > > > > > 
> > > > > > 
> > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR
> > > > > > requests,
> > > > > > why do
> > > > > > we care about stateid values? Just mark the inode as stale
> > > > > > and drop
> > > > > > it
> > > > > > on the floor.
> > > > > 
> > > > > We have a valid state from the server - we really shouldn't
> > > > > just
> > > > > ignore
> > > > > it.
> > > > > 
> > > > > Maybe it would be OK to mark the inode stale.  I don't know if
> > > > > various
> > > > > retry loops abort properly when the inode is stale.
> > > > 
> > > > Yes, they are all supposed to do that. Otherwise we would end up
> > > > looping forever in close(), for instance, since the PUTFH,
> > > > GETATTR and
> > > > CLOSE can all return NFS4ERR_STALE as well.
> > > 
> > > To mark the inode as STALE we still need to find the inode, and
> > > that is
> > > the key bit missing in the current code.  Once we find the inode,
> > > we
> > > could mark it stale, but maybe some other error resulted in the
> > > missing
> > > GETATTR...
> > > 
> > > It might make sense to put the new code in _nfs4_proc_open() after
> > > the
> > > explicit nfs4_proc_getattr() fails.  We would need to find the
> > > inode
> > > given only the filehandle.  Is there any easy way to do that?
> > > 
> > > Thanks,
> > > NeilBrown
> > > 
> > 
> > I couldn't see a consistent pattern to follow for when to mark an
> > inode
> > as stale.  Do this, on top of the previous patch, seem reasonable?
> > 
> > Thanks,
> > NeilBrown
> > 
> > 
> > 
> > diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> > index b441b1d14a50..04497cb42154 100644
> > --- a/fs/nfs/nfs4proc.c
> > +++ b/fs/nfs/nfs4proc.c
> > @@ -489,6 +489,8 @@ static int nfs4_do_handle_exception(struct
> > nfs_server *server,
> >                 case -ESTALE:
> >                         if (inode != NULL && S_ISREG(inode->i_mode))
> >                                 pnfs_destroy_layout(NFS_I(inode));
> > +                       if (inode)
> > +                               nfs_set_inode_stale(inode);
> 
> This is normally dealt with in the generic code inside
> nfs_revalidate_inode(). There should be no need to duplicate it here.
> 
> >                         break;
> >                 case -NFS4ERR_DELEG_REVOKED:
> >                 case -NFS4ERR_ADMIN_REVOKED:
> > @@ -2713,8 +2715,12 @@ static int _nfs4_proc_open(struct
> > nfs4_opendata *data,
> >                         return status;
> >         }
> >         if (!(o_res->f_attr->valid & NFS_ATTR_FATTR)) {
> > +               struct inode *inode = nfs4_get_inode_by_stateid(
> > +                       &data->o_res.stateid,
> > +                       data->owner);
> 
> There shouldn't be a need to go looking for open descriptors here,
> because they will hit ESTALE at some point later anyway.

The problem is that they don't hit ESTALE later.  Unless we update our
stored stateid with the result of the OPEN, we can use the old stateid,
and get the corresponding error.

The only way to avoid the infinite loop is to either mark the inode as
stale, or update the state-id.  For either of those we need to find the
inode.  We don't have fileid so we cannot use iget.  We do have file
handle and state-id.

Maybe you are saying this is a server bug that the client cannot be
expect to cope with at all, and that an infinite loop is a valid client
response to this particular server behaviour.  In that case, I guess no
patch is needed.

NeilBrown

> 
> If we're going to change anything, I'd rather see us return -EACCES and
> -ESTALE from the decode_access() and decode_getfattr() calls in
> nfs4_xdr_dec_open() (and only those errors from those two calls!) so
> that we can skip the unnecessary getattr call here.
> 
> In fact, the only case that this extra getattr should be trying to
> address is the one where the server returns NFS4ERR_DELAY to either the
> decode_access() or the decode_getfattr() calls specifically, and where
> we therefore don't want to replay the entire open call.
> 
> >                 nfs4_sequence_free_slot(&o_res->seq_res);
> > -               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
> > NULL);
> > +               nfs4_proc_getattr(server, &o_res->fh, o_res->f_attr,
> > inode);
> > +               iput(inode);
> >         }
> >         return 0;
> >  }
> > @@ -4335,6 +4341,7 @@ int nfs4_proc_getattr(struct nfs_server
> > *server, struct nfs_fh *fhandle,
> >  {
> >         struct nfs4_exception exception = {
> >                 .interruptible = true,
> > +               .inode = inode,
> >         };
> >         int err;
> >         do {
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
> 
> 
>