Re: refcount underflow in nfsd41_destroy_cb

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Mar 21, 2020, at 11:43 PM, Jan Psota <jasiu@xxxxxxxxxxxx> wrote:
> 
> Chuck Lever <chuck.lever@xxxxxxxxxx> napisał(a):
>> Jan, how are you reproducing this?
> 
> It looks like it's taking place on server on high NFS load and about
> a day after boot! (as I noticed looking into last -x results, below)
> Then system runs all right for a month (to be rebooted on new kernel
> [not always] or something like this).
> 
> We have some NFS-rooted machines:
> /systemd on / type nfs4 (rw,relatime,vers=4.2,rsize=4096,wsize=4096,namlen=255,hard,proto=tcp,
> 	timeo=10,retrans=2,sec=sys,clientaddr=192.168.1.18,local_lock=none,addr=192.168.1.1)
> 
> Server has 10Gb Aquantia AQC107 card connected to Mikrotik CSS326
> switch. Clients running distcc (aside from acting as workstations)
> are connected on 1Gb ethernet. Server runs Gentoo Linux on OpenRC
> (stations have Systemd) with recent gcc-9.3, binutils-2.34 and
> glibc-2.30, has 32 GB RAM and AMD Phenom II X6 1090T CPU.
> 
> /var/tmp/portage, where compilation takes place, normally is on client
> tmpfs, but when there is not enough space to compile huge program, I
> switch it to server exported NFS
> (/etc/exports opts: -rw,async,no_root_squash,no_subtree_check)
> 
> # "grep nfs.*destroy /var/log/messages" mixed with "last -x"

I thought I read in the initial report that you were seeing this
problem only on v5.6-rc6. What is the earliest kernel release
where you saw refcount UaF warnings from nfsd4_destroy_cb?


> reboot   system boot  5.5.1-gentoo     Mon Feb  3 00:20 - 15:22 (25+15:01)
> Feb  4 17:44:39 agro kernel:  nfsd41_destroy_cb+0x2c/0x40 [nfsd]
> 	rust compilation, kernel 5.5.1-gentoo
> 
> reboot   system boot  5.5.6-gentoo     Fri Feb 28 15:23 - 16:25 (14+01:02)
> Feb 29 13:51:49 agro kernel:  nfsd41_destroy_cb+0x2c/0x40 [nfsd]
> 	rust compilation, kernel 5.5.6-gentoo
> 
> reboot   system boot  5.5.9-gentoo     Fri Mar 13 16:27 - 00:04 (4+07:36)
> Mar 14 18:03:49 agro kernel:  nfsd41_destroy_cb+0x2c/0x40 [nfsd]
> 	libpciaccess compilation, kernel 
> 
> reboot   system boot  5.6.0-rc6        Wed Mar 18 00:06 - 20:39 (2+20:32)
> Mar 19 11:08:07 agro kernel:  nfsd41_destroy_cb+0x36/0x50 [nfsd]
> 	linux-firmware merge
> *
> reboot   system boot  5.6.0-rc6        Fri Mar 20 20:40 - 02:40  (05:59)
> Mar 20 21:43:34 agro kernel:  nfsd41_destroy_cb+0x36/0x50 [nfsd]
> 	zstd compilation
> *
> reboot   system boot  5.6.0-rc6        Sat Mar 21 02:42   still running
> Mar 21 17:34:43 agro kernel:  nfsd41_destroy_cb+0x36/0x50 [nfsd]
> 	nodejs compilation
> 
> * - I noticed kernel fault looking for a reason, why WireGuard refused
> to connect with _some_ remote peers so I rebooted the server and it helped.

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux