Re: [PATCH] NFS: Handle missing attributes in OPEN reply

"NeilBrown" <neilb@xxxxxxx> · Fri, 06 Jan 2023 09:02:39 +1100

On Thu, 05 Jan 2023, Olga Kornievskaia wrote:
> On Tue, Jan 3, 2023 at 9:34 PM NeilBrown <neilb@xxxxxxx> wrote:
> >
> > On Wed, 04 Jan 2023, NeilBrown wrote:
> > > On Wed, 04 Jan 2023, Olga Kornievskaia wrote:
> > > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@xxxxxxxxxx> wrote:
> > > > >
> > > > >
> > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do
> > > > > we care about stateid values?
> > > >
> > > > It is acceptable for the server to return ESTALE to the GETATTR after
> > > > the processing the open (due to a REMOVE that comes in) and that open
> > > > generating a valid stateid which client should care about when there
> > > > are pre-existing opens. The server will keep the state of an existing
> > > > opens valid even if the file is removed. Which is what's happening,
> > > > the previous open is being used for IO but the stateid is updated on
> > > > the server but not on the client.
> > >
> > > I agree that it is acceptable to return ESTALE to the GETATTR, but
> > > having done that I don't think it is acceptable for a PUTFH of the same
> > > filehandle to succeed.  Certainly any attempt to again use the
> > > filehandle after the PUTFH should fail with NFS4ERR_STALE.
> > >
> > > RFC7530 says
> > >
> > > 13.1.2.7.  NFS4ERR_STALE (Error Code 70)
> > >
> > >    The current or saved filehandle value designating an argument to the
> > >    current operation is invalid.  The file system object referred to by
> > >    that filehandle no longer exists, or access to it has been revoked.
> > >
> > > So the file doesn't exist or access has been revoked.  So any writes
> > > should fail.  Failing with OLD_STATEID is weird - and having writes
> > > succeed if we use the correct stateid is also odd.  Failing with STALE
> > > would be perfectly sensible and I suspect the Linux client would handle
> > > that just fine.
> >
> > I checked a recent tcpdump (with patched SLE kernel talking to Netapp)
> > and I see that the writes don't succeed after the first NFS4ERR_STALE.
> >
> > If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE.
> > If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID.
> >
> > So it seems that it just has these two checks in the wrong order.
> 
> Does Netapp return ERR_STALE on the PUTFH if the stateid is correct?

In the traces I have, Netapp never returns ERR_STALE on PUTFH.  Of
course the PUTFH operation doesn't have a stateid, so the "if the
stateid is correct" is not meaningful.

ACCESS, READ, WRITE, SETATTR, and GETATTR are the only ops that I have
seen to result in ERR_STALE.

ACCESS and GETATTR don't have a stateid.
READ, WRITE, and SETATTR do.  These get OLD_STATEID if the stateid is
old, and only get STALE if the stateid is current.

> If it's the WRITE operation that returns an error and if the server
> has multiple errors (ie bad statid and bad file handle)` then I don't
> think the spec says which error is more important. In this case, I
> think the server should fail PUTFH with ERR_STALE.

I agree.  If the PUTFH returned STALE there would not be a problem.
Even if a race resulted in the PUTFH succeeding and the WRITE getting
OLD_STATEID, Linux-NFS would retry an the new PUTFH could then fail
correctly.

NeilBrown

> 
> >
> > NeilBrown
>