Re: [PATCH] NFS: Handle missing attributes in OPEN reply

"NeilBrown" <neilb@xxxxxxx> · Wed, 04 Jan 2023 13:34:01 +1100

On Wed, 04 Jan 2023, NeilBrown wrote:
> On Wed, 04 Jan 2023, Olga Kornievskaia wrote:
> > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@xxxxxxxxxx> wrote:
> > >
> > >
> > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
> > > we care about stateid values?
> > 
> > It is acceptable for the server to return ESTALE to the GETATTR after
> > the processing the open (due to a REMOVE that comes in) and that open
> > generating a valid stateid which client should care about when there
> > are pre-existing opens. The server will keep the state of an existing
> > opens valid even if the file is removed. Which is what's happening,
> > the previous open is being used for IO but the stateid is updated on
> > the server but not on the client.
> 
> I agree that it is acceptable to return ESTALE to the GETATTR, but
> having done that I don't think it is acceptable for a PUTFH of the same
> filehandle to succeed.  Certainly any attempt to again use the
> filehandle after the PUTFH should fail with NFS4ERR_STALE.
> 
> RFC7530 says
> 
> 13.1.2.7.  NFS4ERR_STALE (Error Code 70)
> 
>    The current or saved filehandle value designating an argument to the
>    current operation is invalid.  The file system object referred to by
>    that filehandle no longer exists, or access to it has been revoked.
> 
> So the file doesn't exist or access has been revoked.  So any writes
> should fail.  Failing with OLD_STATEID is weird - and having writes
> succeed if we use the correct stateid is also odd.  Failing with STALE
> would be perfectly sensible and I suspect the Linux client would handle
> that just fine.

I checked a recent tcpdump (with patched SLE kernel talking to Netapp)
and I see that the writes don't succeed after the first NFS4ERR_STALE.

If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE.
If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID.

So it seems that it just has these two checks in the wrong order.

NeilBrown