On 22 Feb 2023, at 7:48, Florian Möller wrote: > Am 22.02.23 um 13:22 schrieb Jeff Layton: >> On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote: >>> On 22 Feb 2023, at 3:19, Florian Möller wrote: >>> >>>> Am 21.02.23 um 19:58 schrieb Benjamin Coddington: >>>>> On 21 Feb 2023, at 11:52, Florian Möller wrote: >>>>> >>>>>> Hi Benjamin, >>>>>> >>>>>> here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow. >>>>>> >>>>>> The command I used was >>>>>> >>>>>> touch test.txt && sleep 2 && touch test.txt >>>>>> >>>>>> test.txt did not exist previously. So you have an example of a touch without and with delay. >>>>> >>>>> Thanks! These are great - I can see from them that the client is indeed >>>>> waiting in the stateid update mechanism because the server has returned >>>>> NFS4ERR_STALE to the client's first CLOSE. >>>>> >>>>> That is unusual. The server is signaling that the open file's stateid is old, >>>>> so I am interested to see if the first CLOSE is sent with the stateid's >>>>> sequence that was returned from the server. I could probably see this if I >>>>> had the server-side tracepoint data. >>>> >>>> Hi Benjamin, >>>> >>>> the server-side tracepoints >>>> >>>> nfsd:nfsd_preprocess >>>> sunrpc:svc_process >>>> >>>> were enabled. It seems they did not produce any output. >>>> >>>> What I did now was: >>>> - enable all nfsd tracepoints, >>>> - enable all nfs4 tracepoints, >>>> - enable all sunrpc tracepoints. >>>> >>>> The command I used was >>>> >>>> touch somefile && sleep 2 && touch somefile. >>>> >>>> Then I unmounted the NFS share - this also causes a delay. >>>> >>>> I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0. >>> >>> >>> In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with >>> NFS4ERR_STALE, so nothing to do with the open stateid sequencing. I also >>> see: >>> >>> nfsd-1693 [000] ..... 1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2 >>> nfsd-1693 [000] ..... 1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2 >>> nfsd-1693 [000] ..... 1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70 >>> >> >> This just means that the kernel called into the "cache" infrastructure >> to find an export entry, and there wasn't one. >> >> >> Looking back at the original email here, I'd say this is expected since >> the export wasn't set up for krb5i. >> >> Output of exportfs -v: >> /export >> gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash) >> /export >> gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash) > > I changed the export definitions to > > /export gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash) > /export gss/krb5i(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash) > /export gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash) > > such that all three kerberos security types are exported. With this setup the delay is gone. Without the krb5i export the delay occurs again. > Mounting the NFS share with different kerberos security types does not change this behaviour - without the krb5i export there is a delay. After getting the ol' KDC back in shape and trying to reproduce this, I noticed you're using a client specifier that's been deprecated (the gss/krb5), along with sec=sys in the same line. That's a little weird, not sure what the results of that might be. I can't even get any mount to work with that: [root@fedora ~]# exportfs -v /exports gss/krb5(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash) /exports gss/krb5p(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash) [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora mount.nfs: mounting fedora.fh.bcodding.com:/exports failed, reason given by server: No such file or directory However, if I use the "sec=" method, everything seems to work fine: [root@fedora ~]# exportfs -v /exports 10.0.1.0/24(sync,wdelay,hide,no_subtree_check,nordirplus,sec=sys:krb5:krb5p,rw,insecure,no_root_squash,no_all_squash) [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora [root@fedora ~]# touch /mnt/fedora/krbfoo [root@fedora ~]# touch /mnt/fedora/krbfoo [root@fedora ~]# umount /mnt/fedora [root@fedora ~]# mount -t nfs -ov4.1,sec=krb5p fedora.fh.bcodding.com:/exports /mnt/fedora [root@fedora ~]# touch /mnt/fedora/krbfoo [root@fedora ~]# touch /mnt/fedora/krbfoo [root@fedora ~]# umount /mnt/fedora .. so I can't seem to reproduce this because I can't even mount. Did you perhaps have a previous mount with that client with a different security flavor? I think your NFS client ended up having a machine credential with krb5i, but then the export with that flavor was removed. When the client tried to do a CLOSE, it picked the machine cred, which caused the server to fail in the lookup of the export, which caused the client to think its stateid was wrong. Ben