On Thu, 2019-01-24 at 19:58 +0000, Trond Myklebust wrote: > On Thu, 2019-01-24 at 11:32 -0600, Jason L Tibbitts III wrote: > > I could use some help figuring out the cause of some serious NFS > > client > > issues I'm having with the 4.20.3 kernel which I did not see under > > 4.19.15. > > > > I have a network of about 130 desktops (plus a bunch of other > > machines, > > VMs and the like) running Fedora 29 connecting to six NFS servers > > running CentOS 7.6 (with the heavily patched vendor kernel > > 3.10.0-957.1.3). All machines involved are x86_64. We use > > kerberized > > NFS4 with generally sec=krb5i. The exports are generally made with > > "(rw,async,sec=krb5i:krb5p)". > > > > Since I booted those clients into 4.20.3 I've started seeing > > processes > > getting stuck in the D state. The system itself will seem OK (except > > for the high load average) as long as I don't touch the hung NFS > > mount. > > Nothing was logged to dmesg or to the journal. So far booting back > > into > > the 4.19.15 kernel has cleared up the problem. I cannot yet > > reproduce > > this on demand; I've tried but it is probably related to some > > specific > > usage pattern. > > > > Has anyone else seen issues like this? Can anyone help me to get > > more > > useful information that might point to the problem? I still haven't > > learned how to debug NFS issues properly. And if there's a stress > > test > > tool I could easily run that might help to reproduce the issue, I'd > > be > > happy to run it. > > > > I note that 4.20.4 is out; I see one sunrpc fix which I guess could > > be > > related (sunrpc: handle ENOMEM in rpcb_getport_async) but the systems > > involved have plenty of free memory so I doubt that's it. I'll > > certainly try it anyway. > > > > Various package versions: > > kernel-4.20.3-200.fc29.x86_64 (the problematic kernel) > > kernel-4.19.15-300.fc29.x86_64 (the functional kernel) > > nfs-utils-2.3.3-1.rc2.fc29.x86_64 > > gssproxy-0.8.0-6.fc29.x86_64 > > krb5-libs-1.16.1-25.fc29.i686 > > > > Thanks in advance for any help or advice, > > > > - J< > > Commit deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior > regression") was supposed to be marked for stable as a fix. Chuck & > Anna? Looks like I missed that, sorry! Stable folks, can you please backport deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior regression") to v4.20? Thanks, Anna > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >