I chatted with Trond about this and he says it's a server bug if an unlinked file keeps stateids around - the client doesn't need to issue a close in this case. What version of ONTAP are you running? -dros > On Sep 1, 2017, at 2:44 PM, Weston Andros Adamson <dros@xxxxxxxxxx> wrote: > > Nice analysis! I think post d8d849835eb2082ea17655538a83fa467633927f, we > need to retry with a [PUTFH, CLOSE] if the GETATTR fails. > > The problem as I see it is the GETATTR is tied to the CURRENT_FH, which is > stale for new operations since the file was unlinked, but the CLOSE is tied to the > (CURRENT_FH, open stateid) pair and is not stale because the state id is still > valid. > > Trond is out on PTO, should be back on or before next Tuesday. The recent change > was his and he might have a better idea how to handle this. > > -dros > > >> On Aug 31, 2017, at 1:34 PM, Kjetil Joergensen <kjetil@xxxxxxxxxxxx> wrote: >> >> Hi, >> >> (Now - I do not actually know the specification(s) all that well, so >> it may be that I've by accident cherry picked the bits that partially >> turns this into a linux-nfs-client bug, and I'd be more than happy >> with responses that'd be useful to yell at netapp with). >> >> after d8d849835eb2082ea17655538a83fa467633927f (NFSv4: Place the >> GETATTR operation before the CLOSE). If GETATTR actually fails, CLOSE >> will never be processed by the server, and it seems the linux nfs >> client never tries to re-issue CLOSE. >> >> We have client A holding file F open, client B goes ahead and unlinks >> F, at some point client a does PUTFH,GETATTR, for which the server >> responds NFS4ERR_STALE. >> >> Now, client A goes ahead and tries to clean up it's internal state, >> and sends the server compound PUTFH,GETATTR,CLOSE, for which the >> server responds with PUTFH(NFS4_OK),GETATTR(NFS4ERR_STALE). >> >> Which seems correct in the eyes of RFC7530 section 14.2., which says >> the server should stop processing the compound when a subop fails. >> >> The server has not processed the CLOSE op, and in the case of netapp >> it appears it keeps holding on to the stateid, waiting for the client >> to CLOSE it. >> >> Judging from tcpdump, the client never attempts to re-issue the CLOSE >> op that weren't processed. >> >> On the server side, the stateid sticks around until we tear down the >> client completely (umount or re-boot). Over time, this leads the >> netapp to bleed stateids. >> >> Compare this to pre d8d849835eb2082ea17655538a83fa467633927f, the >> client issues PUTFH,CLOSE,GETATTR. Both PUTFH & CLOSE succeeds, >> GETATTR as expected still gets NFS4ERR_STALE. The server did however >> process CLOSE, and retired it's stateid. >> >> Cheers, >> >> -- >> Kjetil Joergensen <kjetil@xxxxxxxxxxxx> >> Phone: +1 (650) 739-6580 >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html