Re: Need help debugging NFS issues new to 4.20 kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2019-01-24 at 11:32 -0600, Jason L Tibbitts III wrote:
> I could use some help figuring out the cause of some serious NFS
> client
> issues I'm having with the 4.20.3 kernel which I did not see under
> 4.19.15.
> 
> I have a network of about 130 desktops (plus a bunch of other
> machines,
> VMs and the like) running Fedora 29 connecting to six NFS servers
> running CentOS 7.6 (with the heavily patched vendor kernel
> 3.10.0-957.1.3).  All machines involved are x86_64.  We use
> kerberized
> NFS4 with generally sec=krb5i.  The exports are generally made with
> "(rw,async,sec=krb5i:krb5p)".
> 
> Since I booted those clients into 4.20.3 I've started seeing
> processes
> getting stuck in the D state.  The system itself will seem OK (except
> for the high load average) as long as I don't touch the hung NFS
> mount.
> Nothing was logged to dmesg or to the journal.  So far booting back
> into
> the 4.19.15 kernel has cleared up the problem.  I cannot yet
> reproduce
> this on demand; I've tried but it is probably related to some
> specific
> usage pattern.
> 
> Has anyone else seen issues like this?  Can anyone help me to get
> more
> useful information that might point to the problem?  I still haven't
> learned how to debug NFS issues properly.  And if there's a stress
> test
> tool I could easily run that might help to reproduce the issue, I'd
> be
> happy to run it.
> 
> I note that 4.20.4 is out; I see one sunrpc fix which I guess could
> be
> related (sunrpc: handle ENOMEM in rpcb_getport_async) but the systems
> involved have plenty of free memory so I doubt that's it.  I'll
> certainly try it anyway.
> 
> Various package versions:
> kernel-4.20.3-200.fc29.x86_64 (the problematic kernel)
> kernel-4.19.15-300.fc29.x86_64 (the functional kernel)
> nfs-utils-2.3.3-1.rc2.fc29.x86_64
> gssproxy-0.8.0-6.fc29.x86_64
> krb5-libs-1.16.1-25.fc29.i686
> 
> Thanks in advance for any help or advice,
> 
>  - J<

Commit deaa5c96c2f7 ("SUNRPC: Address Kerberos performance/behavior
regression") was supposed to be marked for stable as a fix. Chuck &
Anna?
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux