On Tue, Jan 3, 2023 at 9:34 PM NeilBrown <neilb@xxxxxxx> wrote: > > On Wed, 04 Jan 2023, NeilBrown wrote: > > On Wed, 04 Jan 2023, Olga Kornievskaia wrote: > > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@xxxxxxxxxx> wrote: > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do > > > > we care about stateid values? > > > > > > It is acceptable for the server to return ESTALE to the GETATTR after > > > the processing the open (due to a REMOVE that comes in) and that open > > > generating a valid stateid which client should care about when there > > > are pre-existing opens. The server will keep the state of an existing > > > opens valid even if the file is removed. Which is what's happening, > > > the previous open is being used for IO but the stateid is updated on > > > the server but not on the client. > > > > I agree that it is acceptable to return ESTALE to the GETATTR, but > > having done that I don't think it is acceptable for a PUTFH of the same > > filehandle to succeed. Certainly any attempt to again use the > > filehandle after the PUTFH should fail with NFS4ERR_STALE. > > > > RFC7530 says > > > > 13.1.2.7. NFS4ERR_STALE (Error Code 70) > > > > The current or saved filehandle value designating an argument to the > > current operation is invalid. The file system object referred to by > > that filehandle no longer exists, or access to it has been revoked. > > > > So the file doesn't exist or access has been revoked. So any writes > > should fail. Failing with OLD_STATEID is weird - and having writes > > succeed if we use the correct stateid is also odd. Failing with STALE > > would be perfectly sensible and I suspect the Linux client would handle > > that just fine. > > I checked a recent tcpdump (with patched SLE kernel talking to Netapp) > and I see that the writes don't succeed after the first NFS4ERR_STALE. > > If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE. > If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID. > > So it seems that it just has these two checks in the wrong order. Does Netapp return ERR_STALE on the PUTFH if the stateid is correct? If it's the WRITE operation that returns an error and if the server has multiple errors (ie bad statid and bad file handle)` then I don't think the spec says which error is more important. In this case, I think the server should fail PUTFH with ERR_STALE. > > NeilBrown