Re: NFSv4 client locks up on larger writes with Kerberos enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Sep 25, 2019, at 11:44 AM, Kevin Vasko <kvasko@xxxxxxxxx> wrote:
> 
> Appreciate the information. I have tried different clients. The Centos
> and Ubuntu clients exhibit the same issue. I also did packet tracing.
> On small files the client will go through its "WRITE..." packets...
> and then will issue COMMIT and then do the CLOSE. On larger files it
> will do all of its WRITE... packets and then will issue COMMIT and
> then it will never do a CLOSE. I'm assuming whats happening is the
> client is waiting for the NFS server to authenticate with the Kerberos
> server to make sure the write is okay, however I assume it never does.
> I would like to try and figure out how to see that "authentication"
> packet to kerberos from the server (if there is one).
> 
> Just for giggles, I just tried it with NFSv3 and there is a slight
> change and gives a little more information (albeit, what I kind of
> already assumed). Mounted the shared with krb5p, and vers=3.
> 
> I write a file with
> strace -f dd if=/dev/zero of=/data/test.img bs=10M count=1
> 
> This is successful.  No issues. Just like NFSv4
> 
> Then I change nothing but the size of the file being written (just
> like with NFSv4)…however with NFSv3 it will halt for ~1 minute and
> then spit out a permission denied even though I just wrote out the
> SAME exact file in the SAME exact location (literally overwriting the
> file) a few seconds prior. The only difference in NFSv3 and NFSv4 is
> that NFSv3 fails with a permission issue, whereas NFSv4 never returns.

Sounds like the NFS server is dropping the connection. With
GSS enabled, that's usually a sign that the GSS window has
overflowed.


> strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=1
> 
> This command with NFSv3, takes ~1min and will return file permissions
> error (even though it just wrote it and can run the above command
> again and it will finish with no errors).
> 
> This makes sense in that my initial suspicion is that the Dell Unity
> 300 isn’t authenticating properly with larger writes (for whatever
> reason) with Kerberos. This is also why I don’t see the problem when I
> disable Kerberos on the Unity 300 system.
> 
> When should the NFS server be sending a packet to the Kerberos server
> to validate the write? Or should it be? I did do packet capture on the
> Unity box but I don’t see anything really useful regarding Kerberos
> authentication. What should I be looking for in the packet traces to
> look for the authentication packets?
> 
> Debugging commands:
> 
> $mount
> 
> ...snip…
> 
> 10.75.37.2:/ds1 on /data type nfs
> (rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5p,mountaddr=10.75.37.2,mountvers=3,mountport=1234,mountproto=udp,local_lock=none,addr=10.75.37.2)
> 
> 
> 
> $ time dd if=/dev/zero of=/data/test.img bs=10M count=1
> 1+0 records in
> 1+0 records out
> 10485760 bytes (10 MB, 10 MiB) copied, 0.112561 s, 93.2 MB/s
> real    0m0.188s
> user    0m0.000s
> sys     0m0.087s
> 
> $ time dd if=/dev/zero of=/data/test.img bs=1000M count=1
> dd: closing output file '/data/test.img': Permission denied
> real    1m9.093s
> user    0m0.000s
> sys     0m1.124s
> 
> $ strace -f dd if=/dev/zero of=/data/test.img bs=10M count=1
> execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img",
> "bs=10M", "count=1"], 0x7ffec9914328 /* 30 vars */) = 0
> brk(NULL)                               = 0x5595514e6000
> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0
> mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5e4e5d2000
> close(3)                                = 0
> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"...,
> 832) = 832
> fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x7f5e4e5d0000
> mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
> 0) = 0x7f5e4dfd9000
> mprotect(0x7f5e4e1c0000, 2097152, PROT_NONE) = 0
> mmap(0x7f5e4e3c0000, 24576, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f5e4e3c0000
> mmap(0x7f5e4e3c6000, 15072, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5e4e3c6000
> close(3)                                = 0
> arch_prctl(ARCH_SET_FS, 0x7f5e4e5d1540) = 0
> mprotect(0x7f5e4e3c0000, 16384, PROT_READ) = 0
> mprotect(0x55954fd03000, 4096, PROT_READ) = 0
> mprotect(0x7f5e4e5f1000, 4096, PROT_READ) = 0
> munmap(0x7f5e4e5d2000, 125550)          = 0
> rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
> rt_sigaction(SIGUSR1, {sa_handler=0x55954faf60e0, sa_mask=[INT USR1],
> sa_flags=SA_RESTORER, sa_restorer=0x7f5e4e017f20}, NULL, 8) = 0
> rt_sigaction(SIGINT, {sa_handler=0x55954faf60d0, sa_mask=[INT USR1],
> sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND,
> a_restorer=0x7f5e4e017f20}, NULL, 8) = 0
> brk(NULL)                               = 0x5595514e6000
> brk(0x559551507000)                     = 0x559551507000
> openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0
> mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5e4d4c8000
> close(3)                                = 0
> openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
> dup2(3, 0)                              = 0
> close(3)                                = 0
> lseek(0, 0, SEEK_CUR)                   = 0
> openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
> dup2(3, 1)                              = 1
> close(3)                                = 0
> mmap(NULL, 10498048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
> -1, 0) = 0x7f5e4cac5000
> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 10485760) = 10485760
> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 10485760) = 10485760
> close(0)                                = 0
> close(1)                                = 0
> openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0
> fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
> read(0, "# Locale name alias data base.\n#"..., 4096) = 2995
> read(0, "", 4096)                       = 0
> close(0)                                = 0
> openat(AT_FDCWD,
> "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
> -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
> -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1
> ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY)
> = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) =
> 0
> fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0
> mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f5e4e5f0000
> close(0)                                = 0
> write(2, "1+0 records in\n1+0 records out\n", 311+0 records in
> 1+0 records out
> ) = 31
> write(2, "10485760 bytes (10 MB, 10 MiB) c"..., 6010485760 bytes (10
> MB, 10 MiB) copied, 0.105735 s, 99.2 MB/s) = 60
> write(2, "\n", 1
> )                       = 1
> close(2)                                = 0
> exit_group(0)                           = ?
> +++ exited with 0 +++
> 
> 
> 
> $ strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=1
> execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img",
> "bs=1000M", "count=1"], 0x7fff46397c58 /* 30 vars */) = 0
> brk(NULL)                               = 0x55a768f09000
> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
> access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0
> mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f604f357000
> close(3)                                = 0
> access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
> read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"...,
> 832) = 832
> fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
> mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
> 0) = 0x7f604f355000
> mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
> 0) = 0x7f604ed5e000
> mprotect(0x7f604ef45000, 2097152, PROT_NONE) = 0
> mmap(0x7f604f145000, 24576, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f604f145000
> mmap(0x7f604f14b000, 15072, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f604f14b000
> close(3)                                = 0
> arch_prctl(ARCH_SET_FS, 0x7f604f356540) = 0
> mprotect(0x7f604f145000, 16384, PROT_READ) = 0
> mprotect(0x55a767990000, 4096, PROT_READ) = 0
> mprotect(0x7f604f376000, 4096, PROT_READ) = 0
> munmap(0x7f604f357000, 125550)          = 0
> rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
> rt_sigaction(SIGUSR1, {sa_handler=0x55a7677830e0, sa_mask=[INT USR1],
> sa_flags=SA_RESTORER, sa_restorer=0x7f604ed9cf20}, NULL, 8) = 0
> rt_sigaction(SIGINT, {sa_handler=0x55a7677830d0, sa_mask=[INT USR1],
> sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND,
> a_restorer=0x7f604ed9cf20}, NULL, 8) = 0
> brk(NULL)                               = 0x55a768f09000
> brk(0x55a768f2a000)                     = 0x55a768f2a000
> openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
> fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0
> mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f604e24d000
> close(3)                                = 0
> openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
> dup2(3, 0)                              = 0
> close(3)                                = 0
> lseek(0, 0, SEEK_CUR)                   = 0
> openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
> dup2(3, 1)                              = 1
> close(3)                                = 0
> mmap(NULL, 1048588288, PROT_READ|PROT_WRITE,
> MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f600fa4a000
> read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1048576000) = 1048576000
> write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
> 1048576000) = 1048576000
> close(0)                                = 0
> close(1)                                = -1 EACCES (Permission denied)
> openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0
> fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
> read(0, "# Locale name alias data base.\n#"..., 4096) = 2995
> read(0, "", 4096)                       = 0
> close(0)                                = 0
> openat(AT_FDCWD,
> "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
> -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
> -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1
> ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY)
> = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) =
> 0
> fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0
> mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f604f375000
> close(0)                                = 0
> write(2, "dd: ", 4dd: )                     = 4
> write(2, "closing output file '/data/test."..., 36closing output file
> '/data/test.img') = 36
> openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY)
> = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/libc.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY)
> = -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1
> ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) =
> -1 ENOENT (No such file or directory)
> openat(AT_FDCWD,
> "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) =
> -1 ENOENT (No such file or directory)
> openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo",
> O_RDONLY) = -1 ENOENT (No such file or directory)
> write(2, ": Permission denied", 19: Permission denied)     = 19
> write(2, "\n", 1
> )                       = 1
> close(2)                                = 0
> exit_group(1)                           = ?
> +++ exited with 1 +++
> 
> On Wed, Sep 25, 2019 at 12:06 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> 
>> 
>>> On Sep 25, 2019, at 9:48 AM, bfields@xxxxxxxxxxxx wrote:
>>> 
>>> On Wed, Sep 18, 2019 at 06:36:13PM -0500, Kevin Vasko wrote:
>>>> We have a new Dell EMC Unity 300 acting as NAS Server that is
>>>> presenting a NFSv4 NFS Share. Our clients are mostly Ubuntu 18.04.3
>>>> but issue is also present on CentOS 7.6 systems. We have been
>>>> struggling with this issue for over a week now and not sure how to
>>>> resolve it.
>>>> 
>>>> 
>>>> 
>>>> We are having trouble with NFS Clients completing their writes to the
>>>> Dell EMC Unity 300 NFS Server when Kerberos is enabled on the NFS
>>>> Share. I created the NFS Share on the U300, associated it with our
>>>> FreeIPA (Kerberos/LDAP server) and everything shows successful.
>>> 
>>> Troubleshooting ideas off the top of my head:
>>> 
>>> It might be worth trying some other client versions if it's not hard.
>>> 
>>> It'd be interesting to know what's happening on the network....
>>> Unfortunately big krb5p writes won't be fun to try to capture and
>>> examine.
>> 
>> Wireshark is supposed to have a mechanism for giving it the keys
>> so that captured GSS data can be decrypted. I've never gotten it
>> to work, but I didn't try hard. Should be appropriately documented.
>> 
>> 
>>> Maybe some network or rpc-level statistics would help show if
>>> there are an unusual number of retries or failures.
>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux