Nice analysis! I think post d8d849835eb2082ea17655538a83fa467633927f, we need to retry with a [PUTFH, CLOSE] if the GETATTR fails. The problem as I see it is the GETATTR is tied to the CURRENT_FH, which is stale for new operations since the file was unlinked, but the CLOSE is tied to the (CURRENT_FH, open stateid) pair and is not stale because the state id is still valid. Trond is out on PTO, should be back on or before next Tuesday. The recent change was his and he might have a better idea how to handle this. -dros > On Aug 31, 2017, at 1:34 PM, Kjetil Joergensen <kjetil@xxxxxxxxxxxx> wrote: > > Hi, > > (Now - I do not actually know the specification(s) all that well, so > it may be that I've by accident cherry picked the bits that partially > turns this into a linux-nfs-client bug, and I'd be more than happy > with responses that'd be useful to yell at netapp with). > > after d8d849835eb2082ea17655538a83fa467633927f (NFSv4: Place the > GETATTR operation before the CLOSE). If GETATTR actually fails, CLOSE > will never be processed by the server, and it seems the linux nfs > client never tries to re-issue CLOSE. > > We have client A holding file F open, client B goes ahead and unlinks > F, at some point client a does PUTFH,GETATTR, for which the server > responds NFS4ERR_STALE. > > Now, client A goes ahead and tries to clean up it's internal state, > and sends the server compound PUTFH,GETATTR,CLOSE, for which the > server responds with PUTFH(NFS4_OK),GETATTR(NFS4ERR_STALE). > > Which seems correct in the eyes of RFC7530 section 14.2., which says > the server should stop processing the compound when a subop fails. > > The server has not processed the CLOSE op, and in the case of netapp > it appears it keeps holding on to the stateid, waiting for the client > to CLOSE it. > > Judging from tcpdump, the client never attempts to re-issue the CLOSE > op that weren't processed. > > On the server side, the stateid sticks around until we tear down the > client completely (umount or re-boot). Over time, this leads the > netapp to bleed stateids. > > Compare this to pre d8d849835eb2082ea17655538a83fa467633927f, the > client issues PUTFH,CLOSE,GETATTR. Both PUTFH & CLOSE succeeds, > GETATTR as expected still gets NFS4ERR_STALE. The server did however > process CLOSE, and retired it's stateid. > > Cheers, > > -- > Kjetil Joergensen <kjetil@xxxxxxxxxxxx> > Phone: +1 (650) 739-6580 > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html