Re: NFSv4 client locks up on larger writes with Kerberos enabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Appreciate the information. I have tried different clients. The Centos
and Ubuntu clients exhibit the same issue. I also did packet tracing.
On small files the client will go through its "WRITE..." packets...
and then will issue COMMIT and then do the CLOSE. On larger files it
will do all of its WRITE... packets and then will issue COMMIT and
then it will never do a CLOSE. I'm assuming whats happening is the
client is waiting for the NFS server to authenticate with the Kerberos
server to make sure the write is okay, however I assume it never does.
I would like to try and figure out how to see that "authentication"
packet to kerberos from the server (if there is one).

Just for giggles, I just tried it with NFSv3 and there is a slight
change and gives a little more information (albeit, what I kind of
already assumed). Mounted the shared with krb5p, and vers=3.

 I write a file with
strace -f dd if=/dev/zero of=/data/test.img bs=10M count=1

This is successful.  No issues. Just like NFSv4

Then I change nothing but the size of the file being written (just
like with NFSv4)…however with NFSv3 it will halt for ~1 minute and
then spit out a permission denied even though I just wrote out the
SAME exact file in the SAME exact location (literally overwriting the
file) a few seconds prior. The only difference in NFSv3 and NFSv4 is
that NFSv3 fails with a permission issue, whereas NFSv4 never returns.

strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=1

This command with NFSv3, takes ~1min and will return file permissions
error (even though it just wrote it and can run the above command
again and it will finish with no errors).

This makes sense in that my initial suspicion is that the Dell Unity
300 isn’t authenticating properly with larger writes (for whatever
reason) with Kerberos. This is also why I don’t see the problem when I
disable Kerberos on the Unity 300 system.

When should the NFS server be sending a packet to the Kerberos server
to validate the write? Or should it be? I did do packet capture on the
Unity box but I don’t see anything really useful regarding Kerberos
authentication. What should I be looking for in the packet traces to
look for the authentication packets?

 Debugging commands:

$mount

...snip…

10.75.37.2:/ds1 on /data type nfs
(rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=krb5p,mountaddr=10.75.37.2,mountvers=3,mountport=1234,mountproto=udp,local_lock=none,addr=10.75.37.2)



$ time dd if=/dev/zero of=/data/test.img bs=10M count=1
1+0 records in
1+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.112561 s, 93.2 MB/s
real    0m0.188s
user    0m0.000s
sys     0m0.087s

$ time dd if=/dev/zero of=/data/test.img bs=1000M count=1
dd: closing output file '/data/test.img': Permission denied
real    1m9.093s
user    0m0.000s
sys     0m1.124s

$ strace -f dd if=/dev/zero of=/data/test.img bs=10M count=1
execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img",
"bs=10M", "count=1"], 0x7ffec9914328 /* 30 vars */) = 0
brk(NULL)                               = 0x5595514e6000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0
mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5e4e5d2000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f5e4e5d0000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f5e4dfd9000
mprotect(0x7f5e4e1c0000, 2097152, PROT_NONE) = 0
mmap(0x7f5e4e3c0000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f5e4e3c0000
mmap(0x7f5e4e3c6000, 15072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f5e4e3c6000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f5e4e5d1540) = 0
mprotect(0x7f5e4e3c0000, 16384, PROT_READ) = 0
mprotect(0x55954fd03000, 4096, PROT_READ) = 0
mprotect(0x7f5e4e5f1000, 4096, PROT_READ) = 0
munmap(0x7f5e4e5d2000, 125550)          = 0
rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGUSR1, {sa_handler=0x55954faf60e0, sa_mask=[INT USR1],
sa_flags=SA_RESTORER, sa_restorer=0x7f5e4e017f20}, NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x55954faf60d0, sa_mask=[INT USR1],
sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND,
a_restorer=0x7f5e4e017f20}, NULL, 8) = 0
brk(NULL)                               = 0x5595514e6000
brk(0x559551507000)                     = 0x559551507000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0
mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5e4d4c8000
close(3)                                = 0
openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
lseek(0, 0, SEEK_CUR)                   = 0
openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1)                              = 1
close(3)                                = 0
mmap(NULL, 10498048, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0x7f5e4cac5000
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
10485760) = 10485760
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
10485760) = 10485760
close(0)                                = 0
close(1)                                = 0
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0
fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
read(0, "# Locale name alias data base.\n#"..., 4096) = 2995
read(0, "", 4096)                       = 0
close(0)                                = 0
openat(AT_FDCWD,
"/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY)
= -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) =
0
fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0
mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f5e4e5f0000
close(0)                                = 0
write(2, "1+0 records in\n1+0 records out\n", 311+0 records in
1+0 records out
) = 31
write(2, "10485760 bytes (10 MB, 10 MiB) c"..., 6010485760 bytes (10
MB, 10 MiB) copied, 0.105735 s, 99.2 MB/s) = 60
write(2, "\n", 1
)                       = 1
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++



$ strace -f dd if=/dev/zero of=/data/test.img bs=1000M count=1
execve("/bin/dd", ["dd", "if=/dev/zero", "of=/data/test.img",
"bs=1000M", "count=1"], 0x7fff46397c58 /* 30 vars */) = 0
brk(NULL)                               = 0x55a768f09000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=125550, ...}) = 0
mmap(NULL, 125550, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f604f357000
close(3)                                = 0
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\34\2\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f604f355000
mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f604ed5e000
mprotect(0x7f604ef45000, 2097152, PROT_NONE) = 0
mmap(0x7f604f145000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f604f145000
mmap(0x7f604f14b000, 15072, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f604f14b000
close(3)                                = 0
arch_prctl(ARCH_SET_FS, 0x7f604f356540) = 0
mprotect(0x7f604f145000, 16384, PROT_READ) = 0
mprotect(0x55a767990000, 4096, PROT_READ) = 0
mprotect(0x7f604f376000, 4096, PROT_READ) = 0
munmap(0x7f604f357000, 125550)          = 0
rt_sigaction(SIGINT, NULL, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
rt_sigaction(SIGUSR1, {sa_handler=0x55a7677830e0, sa_mask=[INT USR1],
sa_flags=SA_RESTORER, sa_restorer=0x7f604ed9cf20}, NULL, 8) = 0
rt_sigaction(SIGINT, {sa_handler=0x55a7677830d0, sa_mask=[INT USR1],
sa_flags=SA_RESTORER|SA_NODEFER|SA_RESETHAND,
a_restorer=0x7f604ed9cf20}, NULL, 8) = 0
brk(NULL)                               = 0x55a768f09000
brk(0x55a768f2a000)                     = 0x55a768f2a000
openat(AT_FDCWD, "/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=11603728, ...}) = 0
mmap(NULL, 11603728, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f604e24d000
close(3)                                = 0
openat(AT_FDCWD, "/dev/zero", O_RDONLY) = 3
dup2(3, 0)                              = 0
close(3)                                = 0
lseek(0, 0, SEEK_CUR)                   = 0
openat(AT_FDCWD, "/data/test.img", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
dup2(3, 1)                              = 1
close(3)                                = 0
mmap(NULL, 1048588288, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f600fa4a000
read(0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1048576000) = 1048576000
write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
1048576000) = 1048576000
close(0)                                = 0
close(1)                                = -1 EACCES (Permission denied)
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 0
fstat(0, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
read(0, "# Locale name alias data base.\n#"..., 4096) = 2995
read(0, "", 4096)                       = 0
close(0)                                = 0
openat(AT_FDCWD,
"/usr/share/locale/en_US.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale/en_US.utf8/LC_MESSAGES/coreutils.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale/en.UTF-8/LC_MESSAGES/coreutils.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US/LC_MESSAGES/coreutils.mo", O_RDONLY)
= -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en.utf8/LC_MESSAGES/coreutils.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en/LC_MESSAGES/coreutils.mo", O_RDONLY) =
0
fstat(0, {st_mode=S_IFREG|0644, st_size=578, ...}) = 0
mmap(NULL, 578, PROT_READ, MAP_PRIVATE, 0, 0) = 0x7f604f375000
close(0)                                = 0
write(2, "dd: ", 4dd: )                     = 4
write(2, "closing output file '/data/test."..., 36closing output file
'/data/test.img') = 36
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY)
= -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY)
= -1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1
ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD,
"/usr/share/locale-langpack/en.utf8/LC_MESSAGES/libc.mo", O_RDONLY) =
-1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo",
O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, ": Permission denied", 19: Permission denied)     = 19
write(2, "\n", 1
)                       = 1
close(2)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

On Wed, Sep 25, 2019 at 12:06 PM Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>
>
>
> > On Sep 25, 2019, at 9:48 AM, bfields@xxxxxxxxxxxx wrote:
> >
> > On Wed, Sep 18, 2019 at 06:36:13PM -0500, Kevin Vasko wrote:
> >> We have a new Dell EMC Unity 300 acting as NAS Server that is
> >> presenting a NFSv4 NFS Share. Our clients are mostly Ubuntu 18.04.3
> >> but issue is also present on CentOS 7.6 systems. We have been
> >> struggling with this issue for over a week now and not sure how to
> >> resolve it.
> >>
> >>
> >>
> >> We are having trouble with NFS Clients completing their writes to the
> >> Dell EMC Unity 300 NFS Server when Kerberos is enabled on the NFS
> >> Share. I created the NFS Share on the U300, associated it with our
> >> FreeIPA (Kerberos/LDAP server) and everything shows successful.
> >
> > Troubleshooting ideas off the top of my head:
> >
> > It might be worth trying some other client versions if it's not hard.
> >
> > It'd be interesting to know what's happening on the network....
> > Unfortunately big krb5p writes won't be fun to try to capture and
> > examine.
>
> Wireshark is supposed to have a mechanism for giving it the keys
> so that captured GSS data can be decrypted. I've never gotten it
> to work, but I didn't try hard. Should be appropriately documented.
>
>
> > Maybe some network or rpc-level statistics would help show if
> > there are an unusual number of retries or failures.
>
>
> --
> Chuck Lever
>
>
>




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux