On Fri, Jan 25, 2019 at 07:13:27PM +0000, Schumaker, Anna wrote:
On Thu, 2019-01-24 at 19:58 +0000, Trond Myklebust wrote:
On Thu, 2019-01-24 at 11:32 -0600, Jason L Tibbitts III wrote:
> I could use some help figuring out the cause of some serious NFS
> client
> issues I'm having with the 4.20.3 kernel which I did not see under
> 4.19.15.
>
> I have a network of about 130 desktops (plus a bunch of other
> machines,
> VMs and the like) running Fedora 29 connecting to six NFS servers
> running CentOS 7.6 (with the heavily patched vendor kernel
> 3.10.0-957.1.3). All machines involved are x86_64. We use
> kerberized
> NFS4 with generally sec=krb5i. The exports are generally made with
> "(rw,async,sec=krb5i:krb5p)".
>
> Since I booted those clients into 4.20.3 I've started seeing
> processes
> getting stuck in the D state. The system itself will seem OK (except
> for the high load average) as long as I don't touch the hung NFS
> mount.
> Nothing was logged to dmesg or to the journal. So far booting back
> into
> the 4.19.15 kernel has cleared up the problem. I cannot yet
> reproduce
> this on demand; I've tried but it is probably related to some
> specific
> usage pattern.
>
> Has anyone else seen issues like this? Can anyone help me to get
> more
> useful information that might point to the problem? I still haven't
> learned how to debug NFS issues properly. And if there's a stress
> test
> tool I could easily run that might help to reproduce the issue, I'd
> be
> happy to run it.
>
> I note that 4.20.4 is out; I see one sunrpc fix which I guess could
> be
> related (sunrpc: handle ENOMEM in rpcb_getport_async) but the systems
> involved have plenty of free memory so I doubt that's it. I'll
> certainly try it anyway.
>
> Various package versions:
> kernel-4.20.3-200.fc29.x86_64 (the problematic kernel)
> kernel-4.19.15-300.fc29.x86_64 (the functional kernel)
> nfs-utils-2.3.3-1.rc2.fc29.x86_64
> gssproxy-0.8.0-6.fc29.x86_64
> krb5-libs-1.16.1-25.fc29.i686
>
> Thanks in advance for any help or advice,
>
> - J<
Commit deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior
regression") was supposed to be marked for stable as a fix. Chuck &
Anna?
Looks like I missed that, sorry!
Stable folks, can you please backport deaa5c96c2f7 ("SUNRPC: Address Kerberos
performance/behavior regression") to v4.20?
Queued for 4.20, thank you.
--
Thanks,
Sasha