On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote: > Hi Chuck, Jeff, NFSD maintainers, > > In Debian we got a report from a user which triggered an issue during > package updates hwere nfs-kernel-server restart was involved, then > hanging and included a kernel trace of a NULL pointer dereference. > > The full report is at: > https://bugs.debian.org/1093734 > > While I was not able to trigger the issue, the provided log is as > follows: > > 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log. > 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3 > 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@xxxxxxxxxxxxxxx > 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations. > 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000) > 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15) > 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory > 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090 Thanks for the bug report. It's getting late here, so I can only take a quick look. svc_wake_up is pretty small: void svc_wake_up(struct svc_serv *serv) { struct svc_pool *pool = &serv->sv_pools[0]; set_bit(SP_TASK_PENDING, &pool->sp_flags); svc_pool_wake_idle_thread(pool); } pahole on my machine says that struct svc_serv has this at offset 0x90: struct svc_pool * sv_pools; /* 0x90 0x8 */ So it looks like the nn->nfsd_serv was a NULL pointer. That only happens when we shut down the server, so this looks like a race between filecache garbage collection with shutdown. The filecache gets shut down in nfsd_shutdown_net, which gets called _after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look at whether we can reorder the NULL pointer setting to later, or work around this some other way. Could I trouble you to open a bug for this at bugzilla.kernel.org? > 2025-01-21T12:07:18.015563+01:00 $HOST kernel: #PF: supervisor read access in kernel mode > 2025-01-21T12:07:18.015566+01:00 $HOST kernel: #PF: error_code(0x0000) - not-present page > 2025-01-21T12:07:18.015567+01:00 $HOST kernel: PGD 14b3d9067 P4D 14b3d9067 PUD 14b3da067 PMD 0 > 2025-01-21T12:07:18.015568+01:00 $HOST kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > 2025-01-21T12:07:18.015569+01:00 $HOST kernel: CPU: 8 UID: 0 PID: 231280 Comm: kworker/u67:2 Tainted: G W 6.12.9-amd64 #1 Debian 6.12.9-1 > 2025-01-21T12:07:18.015570+01:00 $HOST kernel: Tainted: [W]=WARN > 2025-01-21T12:07:18.015572+01:00 $HOST kernel: Hardware name: Supermicro AS -2014S-TR/H12SSL-i, BIOS 2.9 05/28/2024 > 2025-01-21T12:07:18.015573+01:00 $HOST kernel: Workqueue: events_unbound nfsd_file_gc_worker [nfsd] > 2025-01-21T12:07:18.015573+01:00 $HOST kernel: RIP: 0010:svc_wake_up+0x9/0x20 [sunrpc] > 2025-01-21T12:07:18.015574+01:00 $HOST kernel: Code: e1 bd ea 0f 0b e9 73 ff ff ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b bf 90 00 00 00 f0 80 8f b8 00 00 00 01 e9 63 aa fe ff 0f 1f > 2025-01-21T12:07:18.015575+01:00 $HOST kernel: RSP: 0018:ffffa9b9690abde8 EFLAGS: 00010286 > 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RAX: 0000000000000001 RBX: ffff9d03f84f6c58 RCX: ffffa9b9690abe30 > 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RDX: ffff9d034a5aa2a8 RSI: ffff9d034a5aa2a8 RDI: 0000000000000000 > 2025-01-21T12:07:18.015577+01:00 $HOST kernel: RBP: ffff9d034a5aa2a0 R08: ffff9d034a5aa2a8 R09: ffffa9b9690abe28 > 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R10: ffff9d0451cff780 R11: 000000000000000f R12: ffffa9b9690abe30 > 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R13: ffff9d034a5aa2a8 R14: ffff9d035451a000 R15: ffff9d034a5aa2a8 > 2025-01-21T12:07:18.015579+01:00 $HOST kernel: FS: 0000000000000000(0000) GS:ffff9d228ec00000(0000) knlGS:0000000000000000 > 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CR2: 0000000000000090 CR3: 0000000106e24003 CR4: 0000000000f70ef0 > 2025-01-21T12:07:18.015581+01:00 $HOST kernel: PKRU: 55555554 > 2025-01-21T12:07:18.015582+01:00 $HOST kernel: Call Trace: > 2025-01-21T12:07:18.015582+01:00 $HOST kernel: <TASK> > 2025-01-21T12:07:18.015583+01:00 $HOST kernel: ? __die_body.cold+0x19/0x27 > 2025-01-21T12:07:18.015584+01:00 $HOST kernel: ? page_fault_oops+0x15a/0x2d0 > 2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? exc_page_fault+0x7e/0x180 > 2025-01-21T12:07:18.015585+01:00 $HOST kernel: ? asm_exc_page_fault+0x26/0x30 > 2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? svc_wake_up+0x9/0x20 [sunrpc] > 2025-01-21T12:07:18.015586+01:00 $HOST kernel: ? srso_alias_return_thunk+0x5/0xfbef5 > 2025-01-21T12:07:18.015587+01:00 $HOST kernel: nfsd_file_dispose_list_delayed+0xa7/0xd0 [nfsd] > 2025-01-21T12:07:18.015588+01:00 $HOST kernel: nfsd_file_gc_worker+0x190/0x2c0 [nfsd] > 2025-01-21T12:07:18.015588+01:00 $HOST kernel: process_one_work+0x177/0x330 > 2025-01-21T12:07:18.015589+01:00 $HOST kernel: worker_thread+0x252/0x390 > 2025-01-21T12:07:18.015590+01:00 $HOST kernel: ? __pfx_worker_thread+0x10/0x10 > 2025-01-21T12:07:18.015611+01:00 $HOST kernel: kthread+0xd2/0x100 > 2025-01-21T12:07:18.015612+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10 > 2025-01-21T12:07:18.015613+01:00 $HOST kernel: ret_from_fork+0x34/0x50 > 2025-01-21T12:07:18.015615+01:00 $HOST kernel: ? __pfx_kthread+0x10/0x10 > 2025-01-21T12:07:18.015616+01:00 $HOST kernel: ret_from_fork_asm+0x1a/0x30 > 2025-01-21T12:07:18.015618+01:00 $HOST kernel: </TASK> > 2025-01-21T12:07:18.015619+01:00 $HOST kernel: Modules linked in: dm_mod tls cpufreq_conservative msr binfmt_misc quota_v2 quota_tree nls_ascii nls_cp437 vfat fat ipmi_ssif rpcrdma rdma_ucm ib_iser nf_conntrack_ftp nf_log_syslog ib_umad nft_log amd_atl intel_rapl_msr intel_rapl_common rdma_cm ib_ipoib amd64_edac iw_cm libiscsi edac_mce_amd nft_limit scsi_transport_iscsi ib_cm kvm_amd nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject kvm crct10dif_pclmul ghash_clmulni_intel nft_ct ast sha512_ssse3 sha256_ssse3 jc42 drm_shmem_helper sha1_ssse3 aesni_intel gf128mul crypto_simd drm_kms_helper cryptd wmi_bmof ee1004 rapl acpi_cpufreq pcspkr i2c_algo_bit ccp acpi_ipmi sp5100_tco k10temp watchdog button nft_masq ipmi_si ipmi_devintf ipmi_msghandler evdev joydev sg nfsd nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 auth_rpcgss nfs_acl lockd grace nf_tables sunrpc drm configfs efi_pstore nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 efivarfs raid10 raid0 hid_generic usbhid hid raid456 async_raid6_recov async_memcpy > 2025-01-21T12:07:18.015622+01:00 $HOST kernel: async_pq async_xor async_tx xor rndis_host cdc_ether usbnet mii raid6_pq libcrc32c crc32c_generic mlx5_ib ib_uverbs ib_core raid1 md_mod ses enclosure scsi_transport_sas sd_mod mlx5_core ahci libahci xhci_pci libata xhci_hcd megaraid_sas tg3 crc32_pclmul scsi_mod crc32c_intel mlxfw usbcore libphy pci_hyperv_intf scsi_common i2c_piix4 i2c_smbus usb_common wmi > 2025-01-21T12:07:18.015624+01:00 $HOST kernel: CR2: 0000000000000090 > 2025-01-21T12:07:18.015625+01:00 $HOST kernel: ---[ end trace 0000000000000000 ]--- > > The used kernel version from the user is 6.12.9 based. > > Does this ring a bell? Might 8e6e2ffa6569 ("nfsd: add list_head nf_gc > to struct nfsd_file") be related? > -- Jeff Layton <jlayton@xxxxxxxxxx>