Re: Need help debugging NFS issues new to 4.20 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2019-01-24 at 19:58 +0000, Trond Myklebust wrote:
> On Thu, 2019-01-24 at 11:32 -0600, Jason L Tibbitts III wrote:
> > I could use some help figuring out the cause of some serious NFS
> > client
> > issues I'm having with the 4.20.3 kernel which I did not see under
> > 4.19.15.
> > 
> > I have a network of about 130 desktops (plus a bunch of other
> > machines,
> > VMs and the like) running Fedora 29 connecting to six NFS servers
> > running CentOS 7.6 (with the heavily patched vendor kernel
> > 3.10.0-957.1.3).  All machines involved are x86_64.  We use
> > kerberized
> > NFS4 with generally sec=krb5i.  The exports are generally made with
> > "(rw,async,sec=krb5i:krb5p)".
> > 
> > Since I booted those clients into 4.20.3 I've started seeing
> > processes
> > getting stuck in the D state.  The system itself will seem OK (except
> > for the high load average) as long as I don't touch the hung NFS
> > mount.
> > Nothing was logged to dmesg or to the journal.  So far booting back
> > into
> > the 4.19.15 kernel has cleared up the problem.  I cannot yet
> > reproduce
> > this on demand; I've tried but it is probably related to some
> > specific
> > usage pattern.
> > 
> > Has anyone else seen issues like this?  Can anyone help me to get
> > more
> > useful information that might point to the problem?  I still haven't
> > learned how to debug NFS issues properly.  And if there's a stress
> > test
> > tool I could easily run that might help to reproduce the issue, I'd
> > be
> > happy to run it.
> > 
> > I note that 4.20.4 is out; I see one sunrpc fix which I guess could
> > be
> > related (sunrpc: handle ENOMEM in rpcb_getport_async) but the systems
> > involved have plenty of free memory so I doubt that's it.  I'll
> > certainly try it anyway.
> > 
> > Various package versions:
> > kernel-4.20.3-200.fc29.x86_64 (the problematic kernel)
> > kernel-4.19.15-300.fc29.x86_64 (the functional kernel)
> > nfs-utils-2.3.3-1.rc2.fc29.x86_64
> > gssproxy-0.8.0-6.fc29.x86_64
> > krb5-libs-1.16.1-25.fc29.i686
> > 
> > Thanks in advance for any help or advice,
> > 
> >  - J<
> 
> Commit deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior
> regression") was supposed to be marked for stable as a fix. Chuck &
> Anna?

Looks like I missed that, sorry!

Stable folks, can you please backport deaa5c96c2f7 ("SUNRPC: Address Kerberos
performance/behavior regression") to v4.20?

Thanks,
Anna
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
> 
> 




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux