Re: Reoccurring 5 second delays during NFS calls

Benjamin Coddington <bcodding@xxxxxxxxxx> · Wed, 22 Feb 2023 07:45:59 -0500

On 22 Feb 2023, at 7:22, Jeff Layton wrote:

> On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
>> On 22 Feb 2023, at 3:19, Florian Möller wrote:
>>
>>> Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
>>>> On 21 Feb 2023, at 11:52, Florian Möller wrote:
>>>>
>>>>> Hi Benjamin,
>>>>>
>>>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.
>>>>>
>>>>> The command I used was
>>>>>
>>>>> touch test.txt && sleep 2 && touch test.txt
>>>>>
>>>>> test.txt did not exist previously. So you have an example of a touch without and with delay.
>>>>
>>>> Thanks!  These are great - I can see from them that the client is indeed
>>>> waiting in the stateid update mechanism because the server has returned
>>>> NFS4ERR_STALE to the client's first CLOSE.
>>>>
>>>> That is unusual.  The server is signaling that the open file's stateid is old,
>>>> so I am interested to see if the first CLOSE is sent with the stateid's
>>>> sequence that was returned from the server.  I could probably see this if I
>>>> had the server-side tracepoint data.
>>>
>>> Hi Benjamin,
>>>
>>> the server-side tracepoints
>>>
>>> nfsd:nfsd_preprocess
>>> sunrpc:svc_process
>>>
>>> were enabled. It seems they did not produce any output.
>>>
>>> What I did now was:
>>> - enable all nfsd tracepoints,
>>> - enable all nfs4 tracepoints,
>>> - enable all sunrpc tracepoints.
>>>
>>> The command I used was
>>>
>>> touch somefile && sleep 2 && touch somefile.
>>>
>>> Then I unmounted the NFS share - this also causes a delay.
>>>
>>> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.
>>
>>
>> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
>> NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
>> see:
>>
>> nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
>> nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
>> nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70
>>
>
> This just means that the kernel called into the "cache" infrastructure
> to find an export entry, and there wasn't one.
>
>
> Looking back at the original email here, I'd say this is expected since
> the export wasn't set up for krb5i.
>
> Output of exportfs -v:
> /export
> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
> /export
> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)

Jeff and I were chatting on #linux-nfs, and its back to looking like a
client issue.. as he's pointed out that the client starts out with krb5,
then in Frame 8 (the CLOSE that receives STALE) switches to krb5i, then back
to krb5 in Frame 10..

.. this should be reproduceable.  Let me dust off my old kdc.

Ben