Re: Reoccurring 5 second delays during NFS calls

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 22.02.23 um 20:43 schrieb Benjamin Coddington:
On 22 Feb 2023, at 7:48, Florian Möller wrote:

Am 22.02.23 um 13:22 schrieb Jeff Layton:
On Wed, 2023-02-22 at 06:54 -0500, Benjamin Coddington wrote:
On 22 Feb 2023, at 3:19, Florian Möller wrote:

Am 21.02.23 um 19:58 schrieb Benjamin Coddington:
On 21 Feb 2023, at 11:52, Florian Möller wrote:

Hi Benjamin,

here are the trace and a listing of the corresponding network packages. If the listing is not detailed enough, I can send you a full package dump tomorrow.

The command I used was

touch test.txt && sleep 2 && touch test.txt

test.txt did not exist previously. So you have an example of a touch without and with delay.

Thanks!  These are great - I can see from them that the client is indeed
waiting in the stateid update mechanism because the server has returned
NFS4ERR_STALE to the client's first CLOSE.

That is unusual.  The server is signaling that the open file's stateid is old,
so I am interested to see if the first CLOSE is sent with the stateid's
sequence that was returned from the server.  I could probably see this if I
had the server-side tracepoint data.

Hi Benjamin,

the server-side tracepoints

nfsd:nfsd_preprocess
sunrpc:svc_process

were enabled. It seems they did not produce any output.

What I did now was:
- enable all nfsd tracepoints,
- enable all nfs4 tracepoints,
- enable all sunrpc tracepoints.

The command I used was

touch somefile && sleep 2 && touch somefile.

Then I unmounted the NFS share - this also causes a delay.

I changed the security type to krb5 and captured trace and network output for a version 4.0 and a version 4.2 mount. The delay does not occur when using version 4.0.


In frame 9 of nfs-v4.2-krb5.pcap, the server responds to PUTFH with
NFS4ERR_STALE, so nothing to do with the open stateid sequencing.  I also
see:

nfsd-1693    [000] .....  1951.353889: nfsd_exp_find_key: fsid=1::{0x0,0xe5fcf0,0xffffc900,0x811e87a3,0xffffffff,0xe5fd00} domain=gss/krb5i status=-2
nfsd-1693    [000] .....  1951.353889: nfsd_set_fh_dentry_badexport: xid=0xe1511810 fh_hash=0x3f9e713a status=-2
nfsd-1693    [000] .....  1951.353890: nfsd_compound_status: op=2/4 OP_PUTFH status=70


This just means that the kernel called into the "cache" infrastructure
to find an export entry, and there wasn't one.


Looking back at the original email here, I'd say this is expected since
the export wasn't set up for krb5i.

Output of exportfs -v:
/export
gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export
gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)

I changed the export definitions to

/export gss/krb5(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export gss/krb5i(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)
/export gss/krb5p(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=sys,rw,secure,root_squash,no_all_squash)

such that all three kerberos security types are exported. With this setup the delay is gone. Without the krb5i export the delay occurs again.
Mounting the NFS share with different kerberos security types does not change this behaviour - without the krb5i export there is a delay.

After getting the ol' KDC back in shape and trying to reproduce this, I
noticed you're using a client specifier that's been deprecated (the
gss/krb5), along with sec=sys in the same line.  That's a little weird, not
sure what the results of that might be.

Our original /etc/exports entries were:

/export		gss/krb5p(rw,crossmnt,async,fsid=0)
/export		gss/krb5(rw,crossmnt,async,fsid=0)

These were based on a template shipping with Ubuntu (file /usr/share/nfs-kernel-server/conffiles/etc.exports from package nfs-kernel-server) using the deprecated gss/krb syntax.
We didn't explicitly add any sec= option.


I can't even get any mount to work with that:

[root@fedora ~]# exportfs -v
/exports      	gss/krb5(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
/exports      	gss/krb5p(sync,wdelay,hide,no_subtree_check,nordirplus,fsid=0,sec=sys,rw,insecure,no_root_squash,no_all_squash)
[root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
mount.nfs: mounting fedora.fh.bcodding.com:/exports failed, reason given by server: No such file or directory

For us, mounting was never really a problem; we tried several distros, each with several kernels. Except for the delay, everything worked fine, no errors or deprecation warnings were given.

However, if I use the "sec=" method, everything seems to work fine:

[root@fedora ~]# exportfs -v
/exports      	10.0.1.0/24(sync,wdelay,hide,no_subtree_check,nordirplus,sec=sys:krb5:krb5p,rw,insecure,no_root_squash,no_all_squash)
[root@fedora ~]# mount -t nfs -ov4.1,sec=krb5 fedora.fh.bcodding.com:/exports /mnt/fedora
[root@fedora ~]# touch /mnt/fedora/krbfoo
[root@fedora ~]# touch /mnt/fedora/krbfoo
[root@fedora ~]# umount /mnt/fedora
[root@fedora ~]# mount -t nfs -ov4.1,sec=krb5p fedora.fh.bcodding.com:/exports /mnt/fedora
[root@fedora ~]# touch /mnt/fedora/krbfoo
[root@fedora ~]# touch /mnt/fedora/krbfoo
[root@fedora ~]# umount /mnt/fedora

.. so I can't seem to reproduce this because I can't even mount.  Did you
perhaps have a previous mount with that client with a different security
flavor?

I can confirm that the delay doesn't occur with

/export		*(rw,crossmnt,async,fsid=0,sec=krb5:krb5p)

exportfs -v now gives

/export <world>(async,wdelay,hide,crossmnt,no_subtree_check,fsid=0,sec=krb5:krb5p,rw,secure,root_squash,no_all_squash)

I think your NFS client ended up having a machine credential with krb5i, but
then the export with that flavor was removed.  When the client tried to do a
CLOSE, it picked the machine cred, which caused the server to fail in the
lookup of the export, which caused the client to think its stateid was
wrong.

Thank you for your efforts! Things are working now.

Perhaps someone could rewrite the "RPCSEC_GSS security" section of exports(5). It starts with

You may use the special strings "gss/krb5", "gss/krb5i", or "gss/krb5p" to restrict access to clients using rpcsec_gss security. However, this syntax is deprecated; on linux kernels since 2.6.23, you should instead use the "sec=" export option

After all this hassle, the first sentence (together with the Ubuntu template) is a bit misleading.

Best,
Florian

--
Dr. Florian Möller
Universität Würzburg
Institut für Mathematik
Emil-Fischer-Straße 30
97074 Würzburg, Germany
Tel. +49 931 3185596




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux