Hi Jeff, On Sun, Jan 26, 2025 at 07:06:09AM -0500, Jeff Layton wrote: > On Sun, 2025-01-26 at 08:57 +0100, Salvatore Bonaccorso wrote: > > Hi Jeff, > > > > On Sat, Jan 25, 2025 at 05:55:50PM -0500, Jeff Layton wrote: > > > On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote: > > > > Hi Chuck, Jeff, NFSD maintainers, > > > > > > > > In Debian we got a report from a user which triggered an issue during > > > > package updates hwere nfs-kernel-server restart was involved, then > > > > hanging and included a kernel trace of a NULL pointer dereference. > > > > > > > > The full report is at: > > > > https://bugs.debian.org/1093734 > > > > > > > > While I was not able to trigger the issue, the provided log is as > > > > follows: > > > > > > > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. > > > > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3 > > > > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@xxxxxxxxxxxxxxx > > > > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations. > > > > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000) > > > > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15) > > > > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory > > > > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090 > > > > > > Thanks for the bug report. It's getting late here, so I can only take a > > > quick look. svc_wake_up is pretty small: > > > > > > void svc_wake_up(struct svc_serv *serv) > > > { > > > struct svc_pool *pool = &serv->sv_pools[0]; > > > > > > set_bit(SP_TASK_PENDING, &pool->sp_flags); > > > svc_pool_wake_idle_thread(pool); > > > } > > > > > > pahole on my machine says that struct svc_serv has this at offset 0x90: > > > > > > struct svc_pool * sv_pools; /* 0x90 0x8 */ > > > > > > So it looks like the nn->nfsd_serv was a NULL pointer. That only > > > happens when we shut down the server, so this looks like a race between > > > filecache garbage collection with shutdown. > > > > > > The filecache gets shut down in nfsd_shutdown_net, which gets called > > > _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look > > > at whether we can reorder the NULL pointer setting to later, or work > > > around this some other way. > > > > > > Could I trouble you to open a bug for this at bugzilla.kernel.org? > > > > Thanks a lot for your quick response on it and the analysis. > > > > Sure I can fill a bug in bugzilla.kernel.org, I see you submitted a > > patch already, do you still want me to do it? > > > > If so I try to reference as well all followups so that the information > > is not spread around threads. > > > > Thanks a lot for your work! > > > > I think you can skip the BZ for now. Ok then I leave the bugzilla bug filling step off. thanks again for your hard work on the NFS front! Regards, Salvatore