Re: linux>=4.10: PUTFH|GETATTR|CLOSE, GETATTR fails, CLOSE not re-issued

Weston Andros Adamson <dros@xxxxxxxxxx> · Tue, 5 Sep 2017 13:51:05 -0400

I chatted with Trond about this and he says it's a server bug if an unlinked file
keeps stateids around - the client doesn't need to issue a close in this case.

What version of ONTAP are you running?

-dros

> On Sep 1, 2017, at 2:44 PM, Weston Andros Adamson <dros@xxxxxxxxxx> wrote:
> 
> Nice analysis! I think post d8d849835eb2082ea17655538a83fa467633927f, we
> need to retry with a [PUTFH, CLOSE] if the GETATTR fails.
> 
> The problem as I see it is the GETATTR is tied to the CURRENT_FH, which is
> stale for new operations since the file was unlinked, but the CLOSE is tied to the
> (CURRENT_FH, open stateid) pair and is not stale because the state id is still
> valid.
> 
> Trond is out on PTO, should be back on or before next Tuesday. The recent change
> was his and he might have a better idea how to handle this.
> 
> -dros
> 
> 
>> On Aug 31, 2017, at 1:34 PM, Kjetil Joergensen <kjetil@xxxxxxxxxxxx> wrote:
>> 
>> Hi,
>> 
>> (Now - I do not actually know the specification(s) all that well, so
>> it may be that I've by accident cherry picked the bits that partially
>> turns this into a linux-nfs-client bug, and I'd be more than happy
>> with responses that'd be useful to yell at netapp with).
>> 
>> after d8d849835eb2082ea17655538a83fa467633927f (NFSv4: Place the
>> GETATTR operation before the CLOSE). If GETATTR actually fails, CLOSE
>> will never be processed by the server, and it seems the linux nfs
>> client never tries to re-issue CLOSE.
>> 
>> We have client A holding file F open,  client B goes ahead and unlinks
>> F, at some point client a does PUTFH,GETATTR, for which the server
>> responds NFS4ERR_STALE.
>> 
>> Now, client A goes ahead and tries to clean up it's internal state,
>> and sends the server compound PUTFH,GETATTR,CLOSE, for which the
>> server responds with PUTFH(NFS4_OK),GETATTR(NFS4ERR_STALE).
>> 
>> Which seems correct in the eyes of RFC7530 section 14.2., which says
>> the server should stop processing the compound when a subop fails.
>> 
>> The server has not processed the CLOSE op, and in the case of netapp
>> it appears it keeps holding on to the stateid, waiting for the client
>> to CLOSE it.
>> 
>> Judging from tcpdump, the client never attempts to re-issue the CLOSE
>> op that weren't processed.
>> 
>> On the server side, the stateid sticks around until we tear down the
>> client completely (umount or re-boot). Over time, this leads the
>> netapp to bleed stateids.
>> 
>> Compare this to pre d8d849835eb2082ea17655538a83fa467633927f, the
>> client issues PUTFH,CLOSE,GETATTR. Both PUTFH & CLOSE succeeds,
>> GETATTR as expected still gets NFS4ERR_STALE. The server did however
>> process CLOSE, and retired it's stateid.
>> 
>> Cheers,
>> 
>> -- 
>> Kjetil Joergensen <kjetil@xxxxxxxxxxxx>
>> Phone: +1 (650) 739-6580
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html