On Thu, 05 Jan 2023, Olga Kornievskaia wrote: > On Tue, Jan 3, 2023 at 9:34 PM NeilBrown <neilb@xxxxxxx> wrote: > > > > On Wed, 04 Jan 2023, NeilBrown wrote: > > > On Wed, 04 Jan 2023, Olga Kornievskaia wrote: > > > > On Tue, Jan 3, 2023 at 7:46 PM Trond Myklebust <trondmy@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > If the server starts to reply NFS4ERR_STALE to GETATTR requests, why do > > > > > we care about stateid values? > > > > > > > > It is acceptable for the server to return ESTALE to the GETATTR after > > > > the processing the open (due to a REMOVE that comes in) and that open > > > > generating a valid stateid which client should care about when there > > > > are pre-existing opens. The server will keep the state of an existing > > > > opens valid even if the file is removed. Which is what's happening, > > > > the previous open is being used for IO but the stateid is updated on > > > > the server but not on the client. > > > > > > I agree that it is acceptable to return ESTALE to the GETATTR, but > > > having done that I don't think it is acceptable for a PUTFH of the same > > > filehandle to succeed. Certainly any attempt to again use the > > > filehandle after the PUTFH should fail with NFS4ERR_STALE. > > > > > > RFC7530 says > > > > > > 13.1.2.7. NFS4ERR_STALE (Error Code 70) > > > > > > The current or saved filehandle value designating an argument to the > > > current operation is invalid. The file system object referred to by > > > that filehandle no longer exists, or access to it has been revoked. > > > > > > So the file doesn't exist or access has been revoked. So any writes > > > should fail. Failing with OLD_STATEID is weird - and having writes > > > succeed if we use the correct stateid is also odd. Failing with STALE > > > would be perfectly sensible and I suspect the Linux client would handle > > > that just fine. > > > > I checked a recent tcpdump (with patched SLE kernel talking to Netapp) > > and I see that the writes don't succeed after the first NFS4ERR_STALE. > > > > If the "correct" stateid is given to WRITE, it returns NFS4ERR_STALE. > > If the older stateid is given to WRITE, it returns NFS4ERR_OLD_STATEID. > > > > So it seems that it just has these two checks in the wrong order. > > Does Netapp return ERR_STALE on the PUTFH if the stateid is correct? In the traces I have, Netapp never returns ERR_STALE on PUTFH. Of course the PUTFH operation doesn't have a stateid, so the "if the stateid is correct" is not meaningful. ACCESS, READ, WRITE, SETATTR, and GETATTR are the only ops that I have seen to result in ERR_STALE. ACCESS and GETATTR don't have a stateid. READ, WRITE, and SETATTR do. These get OLD_STATEID if the stateid is old, and only get STALE if the stateid is current. > If it's the WRITE operation that returns an error and if the server > has multiple errors (ie bad statid and bad file handle)` then I don't > think the spec says which error is more important. In this case, I > think the server should fail PUTFH with ERR_STALE. I agree. If the PUTFH returned STALE there would not be a problem. Even if a race resulted in the PUTFH succeeding and the WRITE getting OLD_STATEID, Linux-NFS would retry an the new PUTFH could then fail correctly. NeilBrown > > > > > NeilBrown >