> On Sep 25, 2019, at 11:44 AM, Kevin Vasko <kvasko@xxxxxxxxx> wrote: > > Appreciate the information. I have tried different clients. The Centos > and Ubuntu clients exhibit the same issue. I also did packet tracing. > On small files the client will go through its "WRITE..." packets... > and then will issue COMMIT and then do the CLOSE. On larger files it > will do all of its WRITE... packets and then will issue COMMIT and > then it will never do a CLOSE. I'm assuming whats happening is the > client is waiting for the NFS server to authenticate with the Kerberos > server to make sure the write is okay, however I assume it never does. > I would like to try and figure out how to see that "authentication" > packet to kerberos from the server (if there is one). > > Just for giggles, I just tried it with NFSv3 and there is a slight > change and gives a little more information (albeit, what I kind of > already assumed). Mounted the shared with krb5p, and vers=3. > > I write a file with > strace -f dd if=/dev/zero of=/data/test.img bs=10M count=1 > > This is successful. No issues. Just like NFSv4 > > Then I change nothing but the size of the file being written (just > like with NFSv4)…however with NFSv3 it will halt for ~1 minute and > then spit out a permission denied even though I just wrote out the > SAME exact file in the SAME exact location (literally overwriting the > file) a few seconds prior. The only difference in NFSv3 and NFSv4 is > that NFSv3 fails with a permission issue, whereas NFSv4 never returns. Sounds like the NFS server is dropping the connection. With GSS enabled, that's usually a sign that the GSS window has overflowed. > strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=1 > > This command with NFSv3, takes ~1min and will return file permissions > error (even though it just wrote it and can run the above command > again and it will finish with no errors). > > This makes sense in that my initial suspicion is that the Dell Unity > 300 isn’t authenticating properly with larger writes (for whatever > reason) with Kerberos. This is also why I don’t see the problem when I > disable Kerberos on the Unity 300 system. > > When should the NFS server be sending a packet to the Kerberos server > to validate the write? Or should it be? I did do packet capture on the > Unity box but I don’t see anything really useful regarding Kerberos > authentication. What should I be looking for in the packet traces to > look for the authentication packets? > > Debugging commands: > > $mount > > ...snip… > > 10.75.37.2:/ds1 on /data type nfs > (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5p,mountaddr=10.75.37.2,mountvers=3,mountport=1234,mountproto=udp,local_lock=none,addr=10.75.37.2) > > > > $ time dd if=/dev/zero of=/data/test.img bs=10M count=1 > 1+0 records in > 1+0 records out > 10485760 bytes (10 MB, 10 MiB) copied, 0.112561 s, 93.2 MB/s > real 0m0.188s > user 0m0.000s > sys 0m0.087s > > $ time dd if=/dev/zero of=/data/test.img bs=1000M count=1 > dd: closing output file '/data/test.img': Permission denied > real 1m9.093s > user 0m0.000s > sys 0m1.124s > > $ strace -f dd if=/dev/zero of=/data/test.img bs=10M count=1 > execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img", > "bs=10M", "count=1"], 0x7ffec9914328 /* 30 vars */) = 0 > brk(NULL) = 0x5595514e6000 > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0 > mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5e4e5d2000 > close(3) = 0 > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 > read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., > 832) = 832 > fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0 > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x7f5e4e5d0000 > mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, > 0) = 0x7f5e4dfd9000 > mprotect(0x7f5e4e1c0000, 2097152, PROT_NONE) = 0 > mmap(0x7f5e4e3c0000, 24576, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f5e4e3c0000 > mmap(0x7f5e4e3c6000, 15072, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5e4e3c6000 > close(3) = 0 > arch_prctl(ARCH_SET_FS, 0x7f5e4e5d1540) = 0 > mprotect(0x7f5e4e3c0000, 16384, PROT_READ) = 0 > mprotect(0x55954fd03000, 4096, PROT_READ) = 0 > mprotect(0x7f5e4e5f1000, 4096, PROT_READ) = 0 > munmap(0x7f5e4e5d2000, 125550) = 0 > rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 > rt_sigaction(SIGUSR1, {sa_handler=0x55954faf60e0, sa_mask=[INT USR1], > sa_flags=SA_RESTORER, sa_restorer=0x7f5e4e017f20}, NULL, 8) = 0 > rt_sigaction(SIGINT, {sa_handler=0x55954faf60d0, sa_mask=[INT USR1], > sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND, > a_restorer=0x7f5e4e017f20}, NULL, 8) = 0 > brk(NULL) = 0x5595514e6000 > brk(0x559551507000) = 0x559551507000 > openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0 > mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5e4d4c8000 > close(3) = 0 > openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3 > dup2(3, 0) = 0 > close(3) = 0 > lseek(0, 0, SEEK_CUR) = 0 > openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 > dup2(3, 1) = 1 > close(3) = 0 > mmap(NULL, 10498048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > -1, 0) = 0x7f5e4cac5000 > read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 10485760) = 10485760 > write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 10485760) = 10485760 > close(0) = 0 > close(1) = 0 > openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0 > fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0 > read(0, "# Locale name alias data base.\n#"..., 4096) = 2995 > read(0, "", 4096) = 0 > close(0) = 0 > openat(AT_FDCWD, > "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = > -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = > -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 > ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) > = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = > 0 > fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0 > mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f5e4e5f0000 > close(0) = 0 > write(2, "1+0 records in\n1+0 records out\n", 311+0 records in > 1+0 records out > ) = 31 > write(2, "10485760 bytes (10 MB, 10 MiB) c"..., 6010485760 bytes (10 > MB, 10 MiB) copied, 0.105735 s, 99.2 MB/s) = 60 > write(2, "\n", 1 > ) = 1 > close(2) = 0 > exit_group(0) = ? > +++ exited with 0 +++ > > > > $ strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=1 > execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img", > "bs=1000M", "count=1"], 0x7fff46397c58 /* 30 vars */) = 0 > brk(NULL) = 0x55a768f09000 > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0 > mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f604f357000 > close(3) = 0 > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 > read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., > 832) = 832 > fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0 > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x7f604f355000 > mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, > 0) = 0x7f604ed5e000 > mprotect(0x7f604ef45000, 2097152, PROT_NONE) = 0 > mmap(0x7f604f145000, 24576, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f604f145000 > mmap(0x7f604f14b000, 15072, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f604f14b000 > close(3) = 0 > arch_prctl(ARCH_SET_FS, 0x7f604f356540) = 0 > mprotect(0x7f604f145000, 16384, PROT_READ) = 0 > mprotect(0x55a767990000, 4096, PROT_READ) = 0 > mprotect(0x7f604f376000, 4096, PROT_READ) = 0 > munmap(0x7f604f357000, 125550) = 0 > rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 > rt_sigaction(SIGUSR1, {sa_handler=0x55a7677830e0, sa_mask=[INT USR1], > sa_flags=SA_RESTORER, sa_restorer=0x7f604ed9cf20}, NULL, 8) = 0 > rt_sigaction(SIGINT, {sa_handler=0x55a7677830d0, sa_mask=[INT USR1], > sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND, > a_restorer=0x7f604ed9cf20}, NULL, 8) = 0 > brk(NULL) = 0x55a768f09000 > brk(0x55a768f2a000) = 0x55a768f2a000 > openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 > fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0 > mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f604e24d000 > close(3) = 0 > openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3 > dup2(3, 0) = 0 > close(3) = 0 > lseek(0, 0, SEEK_CUR) = 0 > openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 > dup2(3, 1) = 1 > close(3) = 0 > mmap(NULL, 1048588288, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f600fa4a000 > read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 1048576000) = 1048576000 > write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > 1048576000) = 1048576000 > close(0) = 0 > close(1) = -1 EACCES (Permission denied) > openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0 > fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0 > read(0, "# Locale name alias data base.\n#"..., 4096) = 2995 > read(0, "", 4096) = 0 > close(0) = 0 > openat(AT_FDCWD, > "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = > -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = > -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 > ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) > = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = > 0 > fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0 > mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f604f375000 > close(0) = 0 > write(2, "dd: ", 4dd: ) = 4 > write(2, "closing output file '/data/test."..., 36closing output file > '/data/test.img') = 36 > openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) > = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/libc.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) > = -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 > ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = > -1 ENOENT (No such file or directory) > openat(AT_FDCWD, > "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = > -1 ENOENT (No such file or directory) > openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", > O_RDONLY) = -1 ENOENT (No such file or directory) > write(2, ": Permission denied", 19: Permission denied) = 19 > write(2, "\n", 1 > ) = 1 > close(2) = 0 > exit_group(1) = ? > +++ exited with 1 +++ > > On Wed, Sep 25, 2019 at 12:06 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >> >> >> >>> On Sep 25, 2019, at 9:48 AM, bfields@xxxxxxxxxxxx wrote: >>> >>> On Wed, Sep 18, 2019 at 06:36:13PM -0500, Kevin Vasko wrote: >>>> We have a new Dell EMC Unity 300 acting as NAS Server that is >>>> presenting a NFSv4 NFS Share. Our clients are mostly Ubuntu 18.04.3 >>>> but issue is also present on CentOS 7.6 systems. We have been >>>> struggling with this issue for over a week now and not sure how to >>>> resolve it. >>>> >>>> >>>> >>>> We are having trouble with NFS Clients completing their writes to the >>>> Dell EMC Unity 300 NFS Server when Kerberos is enabled on the NFS >>>> Share. I created the NFS Share on the U300, associated it with our >>>> FreeIPA (Kerberos/LDAP server) and everything shows successful. >>> >>> Troubleshooting ideas off the top of my head: >>> >>> It might be worth trying some other client versions if it's not hard. >>> >>> It'd be interesting to know what's happening on the network.... >>> Unfortunately big krb5p writes won't be fun to try to capture and >>> examine. >> >> Wireshark is supposed to have a mechanism for giving it the keys >> so that captured GSS data can be decrypted. I've never gotten it >> to work, but I didn't try hard. Should be appropriately documented. >> >> >>> Maybe some network or rpc-level statistics would help show if >>> there are an unusual number of retries or failures. >> >> >> -- >> Chuck Lever >> >> >> -- Chuck Lever