Forgive my ignorance but I assume you mean GSS (Generic Security Services) of which Kerberos is an implementation of? So effectively in my environment with Kerberos being enabled, thats usually a sign that the GSS window has overflowed? I assume that would be the same meaning for my context? "Sounds like the NFS server is dropping the connection." This seems certainly likely that this could be the issue. Is there a way of proving that the GSS window is overflowing via the packets? If I could get to the rpc-gssd service on the NFS server would that show me what I need to know? Or would there be some other service that would help narrow it down? Thanks again, -Kevin On Wed, Sep 25, 2019 at 1:49 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > > > > > On Sep 25, 2019, at 11:44 AM, Kevin Vasko <kvasko@xxxxxxxxx> wrote: > > > > Appreciate the information. I have tried different clients. The Centos > > and Ubuntu clients exhibit the same issue. I also did packet tracing. > > On small files the client will go through its "WRITE..." packets... > > and then will issue COMMIT and then do the CLOSE. On larger files it > > will do all of its WRITE... packets and then will issue COMMIT and > > then it will never do a CLOSE. I'm assuming whats happening is the > > client is waiting for the NFS server to authenticate with the Kerberos > > server to make sure the write is okay, however I assume it never does. > > I would like to try and figure out how to see that "authentication" > > packet to kerberos from the server (if there is one). > > > > Just for giggles, I just tried it with NFSv3 and there is a slight > > change and gives a little more information (albeit, what I kind of > > already assumed). Mounted the shared with krb5p, and vers=3. > > > > I write a file with > > strace -f dd if=/dev/zero of=/data/test.img bs=10M count=1 > > > > This is successful. No issues. Just like NFSv4 > > > > Then I change nothing but the size of the file being written (just > > like with NFSv4)…however with NFSv3 it will halt for ~1 minute and > > then spit out a permission denied even though I just wrote out the > > SAME exact file in the SAME exact location (literally overwriting the > > file) a few seconds prior. The only difference in NFSv3 and NFSv4 is > > that NFSv3 fails with a permission issue, whereas NFSv4 never returns. > > Sounds like the NFS server is dropping the connection. With > GSS enabled, that's usually a sign that the GSS window has > overflowed. > > > > strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=1 > > > > This command with NFSv3, takes ~1min and will return file permissions > > error (even though it just wrote it and can run the above command > > again and it will finish with no errors). > > > > This makes sense in that my initial suspicion is that the Dell Unity > > 300 isn’t authenticating properly with larger writes (for whatever > > reason) with Kerberos. This is also why I don’t see the problem when I > > disable Kerberos on the Unity 300 system. > > > > When should the NFS server be sending a packet to the Kerberos server > > to validate the write? Or should it be? I did do packet capture on the > > Unity box but I don’t see anything really useful regarding Kerberos > > authentication. What should I be looking for in the packet traces to > > look for the authentication packets? > > > > Debugging commands: > > > > $mount > > > > ...snip… > > > > 10.75.37.2:/ds1 on /data type nfs > > (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5p,mountaddr=10.75.37.2,mountvers=3,mountport=1234,mountproto=udp,local_lock=none,addr=10.75.37.2) > > > > > > > > $ time dd if=/dev/zero of=/data/test.img bs=10M count=1 > > 1+0 records in > > 1+0 records out > > 10485760 bytes (10 MB, 10 MiB) copied, 0.112561 s, 93.2 MB/s > > real 0m0.188s > > user 0m0.000s > > sys 0m0.087s > > > > $ time dd if=/dev/zero of=/data/test.img bs=1000M count=1 > > dd: closing output file '/data/test.img': Permission denied > > real 1m9.093s > > user 0m0.000s > > sys 0m1.124s > > > > $ strace -f dd if=/dev/zero of=/data/test.img bs=10M count=1 > > execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img", > > "bs=10M", "count=1"], 0x7ffec9914328 /* 30 vars */) = 0 > > brk(NULL) = 0x5595514e6000 > > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0 > > mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5e4e5d2000 > > close(3) = 0 > > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 > > read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., > > 832) = 832 > > fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > > 0) = 0x7f5e4e5d0000 > > mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, > > 0) = 0x7f5e4dfd9000 > > mprotect(0x7f5e4e1c0000, 2097152, PROT_NONE) = 0 > > mmap(0x7f5e4e3c0000, 24576, PROT_READ|PROT_WRITE, > > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f5e4e3c0000 > > mmap(0x7f5e4e3c6000, 15072, PROT_READ|PROT_WRITE, > > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5e4e3c6000 > > close(3) = 0 > > arch_prctl(ARCH_SET_FS, 0x7f5e4e5d1540) = 0 > > mprotect(0x7f5e4e3c0000, 16384, PROT_READ) = 0 > > mprotect(0x55954fd03000, 4096, PROT_READ) = 0 > > mprotect(0x7f5e4e5f1000, 4096, PROT_READ) = 0 > > munmap(0x7f5e4e5d2000, 125550) = 0 > > rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 > > rt_sigaction(SIGUSR1, {sa_handler=0x55954faf60e0, sa_mask=[INT USR1], > > sa_flags=SA_RESTORER, sa_restorer=0x7f5e4e017f20}, NULL, 8) = 0 > > rt_sigaction(SIGINT, {sa_handler=0x55954faf60d0, sa_mask=[INT USR1], > > sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND, > > a_restorer=0x7f5e4e017f20}, NULL, 8) = 0 > > brk(NULL) = 0x5595514e6000 > > brk(0x559551507000) = 0x559551507000 > > openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0 > > mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5e4d4c8000 > > close(3) = 0 > > openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3 > > dup2(3, 0) = 0 > > close(3) = 0 > > lseek(0, 0, SEEK_CUR) = 0 > > openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 > > dup2(3, 1) = 1 > > close(3) = 0 > > mmap(NULL, 10498048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, > > -1, 0) = 0x7f5e4cac5000 > > read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > 10485760) = 10485760 > > write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > 10485760) = 10485760 > > close(0) = 0 > > close(1) = 0 > > openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0 > > fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0 > > read(0, "# Locale name alias data base.\n#"..., 4096) = 2995 > > read(0, "", 4096) = 0 > > close(0) = 0 > > openat(AT_FDCWD, > > "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = > > -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = > > -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 > > ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) > > = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = > > 0 > > fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0 > > mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f5e4e5f0000 > > close(0) = 0 > > write(2, "1+0 records in\n1+0 records out\n", 311+0 records in > > 1+0 records out > > ) = 31 > > write(2, "10485760 bytes (10 MB, 10 MiB) c"..., 6010485760 bytes (10 > > MB, 10 MiB) copied, 0.105735 s, 99.2 MB/s) = 60 > > write(2, "\n", 1 > > ) = 1 > > close(2) = 0 > > exit_group(0) = ? > > +++ exited with 0 +++ > > > > > > > > $ strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=1 > > execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img", > > "bs=1000M", "count=1"], 0x7fff46397c58 /* 30 vars */) = 0 > > brk(NULL) = 0x55a768f09000 > > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0 > > mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f604f357000 > > close(3) = 0 > > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 > > read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"..., > > 832) = 832 > > fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0 > > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > > 0) = 0x7f604f355000 > > mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, > > 0) = 0x7f604ed5e000 > > mprotect(0x7f604ef45000, 2097152, PROT_NONE) = 0 > > mmap(0x7f604f145000, 24576, PROT_READ|PROT_WRITE, > > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f604f145000 > > mmap(0x7f604f14b000, 15072, PROT_READ|PROT_WRITE, > > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f604f14b000 > > close(3) = 0 > > arch_prctl(ARCH_SET_FS, 0x7f604f356540) = 0 > > mprotect(0x7f604f145000, 16384, PROT_READ) = 0 > > mprotect(0x55a767990000, 4096, PROT_READ) = 0 > > mprotect(0x7f604f376000, 4096, PROT_READ) = 0 > > munmap(0x7f604f357000, 125550) = 0 > > rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0 > > rt_sigaction(SIGUSR1, {sa_handler=0x55a7677830e0, sa_mask=[INT USR1], > > sa_flags=SA_RESTORER, sa_restorer=0x7f604ed9cf20}, NULL, 8) = 0 > > rt_sigaction(SIGINT, {sa_handler=0x55a7677830d0, sa_mask=[INT USR1], > > sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND, > > a_restorer=0x7f604ed9cf20}, NULL, 8) = 0 > > brk(NULL) = 0x55a768f09000 > > brk(0x55a768f2a000) = 0x55a768f2a000 > > openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 > > fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0 > > mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f604e24d000 > > close(3) = 0 > > openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3 > > dup2(3, 0) = 0 > > close(3) = 0 > > lseek(0, 0, SEEK_CUR) = 0 > > openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3 > > dup2(3, 1) = 1 > > close(3) = 0 > > mmap(NULL, 1048588288, PROT_READ|PROT_WRITE, > > MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f600fa4a000 > > read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > 1048576000) = 1048576000 > > write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., > > 1048576000) = 1048576000 > > close(0) = 0 > > close(1) = -1 EACCES (Permission denied) > > openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0 > > fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0 > > read(0, "# Locale name alias data base.\n#"..., 4096) = 2995 > > read(0, "", 4096) = 0 > > close(0) = 0 > > openat(AT_FDCWD, > > "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = > > -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) = > > -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1 > > ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY) > > = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) = > > 0 > > fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0 > > mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f604f375000 > > close(0) = 0 > > write(2, "dd: ", 4dd: ) = 4 > > write(2, "closing output file '/data/test."..., 36closing output file > > '/data/test.img') = 36 > > openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) > > = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/libc.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) > > = -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 > > ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = > > -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, > > "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = > > -1 ENOENT (No such file or directory) > > openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", > > O_RDONLY) = -1 ENOENT (No such file or directory) > > write(2, ": Permission denied", 19: Permission denied) = 19 > > write(2, "\n", 1 > > ) = 1 > > close(2) = 0 > > exit_group(1) = ? > > +++ exited with 1 +++ > > > > On Wed, Sep 25, 2019 at 12:06 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: > >> > >> > >> > >>> On Sep 25, 2019, at 9:48 AM, bfields@xxxxxxxxxxxx wrote: > >>> > >>> On Wed, Sep 18, 2019 at 06:36:13PM -0500, Kevin Vasko wrote: > >>>> We have a new Dell EMC Unity 300 acting as NAS Server that is > >>>> presenting a NFSv4 NFS Share. Our clients are mostly Ubuntu 18.04.3 > >>>> but issue is also present on CentOS 7.6 systems. We have been > >>>> struggling with this issue for over a week now and not sure how to > >>>> resolve it. > >>>> > >>>> > >>>> > >>>> We are having trouble with NFS Clients completing their writes to the > >>>> Dell EMC Unity 300 NFS Server when Kerberos is enabled on the NFS > >>>> Share. I created the NFS Share on the U300, associated it with our > >>>> FreeIPA (Kerberos/LDAP server) and everything shows successful. > >>> > >>> Troubleshooting ideas off the top of my head: > >>> > >>> It might be worth trying some other client versions if it's not hard. > >>> > >>> It'd be interesting to know what's happening on the network.... > >>> Unfortunately big krb5p writes won't be fun to try to capture and > >>> examine. > >> > >> Wireshark is supposed to have a mechanism for giving it the keys > >> so that captured GSS data can be decrypted. I've never gotten it > >> to work, but I didn't try hard. Should be appropriately documented. > >> > >> > >>> Maybe some network or rpc-level statistics would help show if > >>> there are an unusual number of retries or failures. > >> > >> > >> -- > >> Chuck Lever > >> > >> > >> > > -- > Chuck Lever > > >