Re: kernel NULL pointer dereference: Workqueue: events_unbound nfsd_file_gc_worker, RIP: 0010:svc_wake_up+0x9/0x20

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2025-01-25 at 21:44 +0100, Salvatore Bonaccorso wrote:
> Hi Chuck, Jeff, NFSD maintainers,
> 
> In Debian we got a report from a user which triggered an issue during
> package updates hwere nfs-kernel-server restart was involved, then
> hanging and included a kernel trace of a NULL pointer dereference.
> 
> The full report is at:
> https://bugs.debian.org/1093734
> 
> While I was not able to trigger the issue, the provided log is as
> follows:
> 
> 2025-01-21T12:07:01.516291+01:00 $HOST kernel: device-mapper: core: CONFIG_IMA_DISABLE_HTABLE is disabled. Duplicate IMA measurements will not be recorded in the IMA log.
> 2025-01-21T12:07:01.516310+01:00 $HOST kernel: device-mapper: uevent: version 1.0.3
> 2025-01-21T12:07:01.516312+01:00 $HOST kernel: device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@xxxxxxxxxxxxxxx
> 2025-01-21T12:07:13.528044+01:00 $HOST kernel: NFSD: Using nfsdcld client tracking operations.
> 2025-01-21T12:07:13.528061+01:00 $HOST kernel: NFSD: no clients to reclaim, skipping NFSv4 grace period (net f0000000)
> 2025-01-21T12:07:17.558915+01:00 $HOST blkmapd[1148]: exit on signal(15)
> 2025-01-21T12:07:17.574410+01:00 $HOST blkmapd[239859]: open pipe file /run/rpc_pipefs/nfs/blocklayout failed: No such file or directory
> 2025-01-21T12:07:18.015541+01:00 $HOST kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090

Thanks for the bug report. It's getting late here, so I can only take a
quick look. svc_wake_up is pretty small:

void svc_wake_up(struct svc_serv *serv)
{
        struct svc_pool *pool = &serv->sv_pools[0];

        set_bit(SP_TASK_PENDING, &pool->sp_flags);
        svc_pool_wake_idle_thread(pool);
}

pahole on my machine says that struct svc_serv has this at offset 0x90:

	struct svc_pool *          sv_pools;             /*  0x90   0x8 */

So it looks like the nn->nfsd_serv was a NULL pointer. That only
happens when we shut down the server, so this looks like a race between
filecache garbage collection with shutdown.

The filecache gets shut down in nfsd_shutdown_net, which gets called
_after_ setting the nn->nfsd_serv pointer to NULL. We'll have to look
at whether we can reorder the NULL pointer setting to later, or work
around this some other way.

Could I trouble you to open a bug for this at bugzilla.kernel.org?

> 2025-01-21T12:07:18.015563+01:00 $HOST kernel: #PF: supervisor read access in kernel mode
> 2025-01-21T12:07:18.015566+01:00 $HOST kernel: #PF: error_code(0x0000) - not-present page
> 2025-01-21T12:07:18.015567+01:00 $HOST kernel: PGD 14b3d9067 P4D 14b3d9067 PUD 14b3da067 PMD 0 
> 2025-01-21T12:07:18.015568+01:00 $HOST kernel: Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
> 2025-01-21T12:07:18.015569+01:00 $HOST kernel: CPU: 8 UID: 0 PID: 231280 Comm: kworker/u67:2 Tainted: G        W          6.12.9-amd64 #1  Debian 6.12.9-1
> 2025-01-21T12:07:18.015570+01:00 $HOST kernel: Tainted: [W]=WARN
> 2025-01-21T12:07:18.015572+01:00 $HOST kernel: Hardware name: Supermicro AS -2014S-TR/H12SSL-i, BIOS 2.9 05/28/2024
> 2025-01-21T12:07:18.015573+01:00 $HOST kernel: Workqueue: events_unbound nfsd_file_gc_worker [nfsd]
> 2025-01-21T12:07:18.015573+01:00 $HOST kernel: RIP: 0010:svc_wake_up+0x9/0x20 [sunrpc]
> 2025-01-21T12:07:18.015574+01:00 $HOST kernel: Code: e1 bd ea 0f 0b e9 73 ff ff ff 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 <48> 8b bf 90 00 00 00 f0 80 8f b8 00 00 00 01 e9 63 aa fe ff 0f 1f
> 2025-01-21T12:07:18.015575+01:00 $HOST kernel: RSP: 0018:ffffa9b9690abde8 EFLAGS: 00010286
> 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RAX: 0000000000000001 RBX: ffff9d03f84f6c58 RCX: ffffa9b9690abe30
> 2025-01-21T12:07:18.015576+01:00 $HOST kernel: RDX: ffff9d034a5aa2a8 RSI: ffff9d034a5aa2a8 RDI: 0000000000000000
> 2025-01-21T12:07:18.015577+01:00 $HOST kernel: RBP: ffff9d034a5aa2a0 R08: ffff9d034a5aa2a8 R09: ffffa9b9690abe28
> 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R10: ffff9d0451cff780 R11: 000000000000000f R12: ffffa9b9690abe30
> 2025-01-21T12:07:18.015578+01:00 $HOST kernel: R13: ffff9d034a5aa2a8 R14: ffff9d035451a000 R15: ffff9d034a5aa2a8
> 2025-01-21T12:07:18.015579+01:00 $HOST kernel: FS:  0000000000000000(0000) GS:ffff9d228ec00000(0000) knlGS:0000000000000000
> 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2025-01-21T12:07:18.015580+01:00 $HOST kernel: CR2: 0000000000000090 CR3: 0000000106e24003 CR4: 0000000000f70ef0
> 2025-01-21T12:07:18.015581+01:00 $HOST kernel: PKRU: 55555554
> 2025-01-21T12:07:18.015582+01:00 $HOST kernel: Call Trace:
> 2025-01-21T12:07:18.015582+01:00 $HOST kernel:  <TASK>
> 2025-01-21T12:07:18.015583+01:00 $HOST kernel:  ? __die_body.cold+0x19/0x27
> 2025-01-21T12:07:18.015584+01:00 $HOST kernel:  ? page_fault_oops+0x15a/0x2d0
> 2025-01-21T12:07:18.015585+01:00 $HOST kernel:  ? exc_page_fault+0x7e/0x180
> 2025-01-21T12:07:18.015585+01:00 $HOST kernel:  ? asm_exc_page_fault+0x26/0x30
> 2025-01-21T12:07:18.015586+01:00 $HOST kernel:  ? svc_wake_up+0x9/0x20 [sunrpc]
> 2025-01-21T12:07:18.015586+01:00 $HOST kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
> 2025-01-21T12:07:18.015587+01:00 $HOST kernel:  nfsd_file_dispose_list_delayed+0xa7/0xd0 [nfsd]
> 2025-01-21T12:07:18.015588+01:00 $HOST kernel:  nfsd_file_gc_worker+0x190/0x2c0 [nfsd]
> 2025-01-21T12:07:18.015588+01:00 $HOST kernel:  process_one_work+0x177/0x330
> 2025-01-21T12:07:18.015589+01:00 $HOST kernel:  worker_thread+0x252/0x390
> 2025-01-21T12:07:18.015590+01:00 $HOST kernel:  ? __pfx_worker_thread+0x10/0x10
> 2025-01-21T12:07:18.015611+01:00 $HOST kernel:  kthread+0xd2/0x100
> 2025-01-21T12:07:18.015612+01:00 $HOST kernel:  ? __pfx_kthread+0x10/0x10
> 2025-01-21T12:07:18.015613+01:00 $HOST kernel:  ret_from_fork+0x34/0x50
> 2025-01-21T12:07:18.015615+01:00 $HOST kernel:  ? __pfx_kthread+0x10/0x10
> 2025-01-21T12:07:18.015616+01:00 $HOST kernel:  ret_from_fork_asm+0x1a/0x30
> 2025-01-21T12:07:18.015618+01:00 $HOST kernel:  </TASK>
> 2025-01-21T12:07:18.015619+01:00 $HOST kernel: Modules linked in: dm_mod tls cpufreq_conservative msr binfmt_misc quota_v2 quota_tree nls_ascii nls_cp437 vfat fat ipmi_ssif rpcrdma rdma_ucm ib_iser nf_conntrack_ftp nf_log_syslog ib_umad nft_log amd_atl intel_rapl_msr intel_rapl_common rdma_cm ib_ipoib amd64_edac iw_cm libiscsi edac_mce_amd nft_limit scsi_transport_iscsi ib_cm kvm_amd nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject kvm crct10dif_pclmul ghash_clmulni_intel nft_ct ast sha512_ssse3 sha256_ssse3 jc42 drm_shmem_helper sha1_ssse3 aesni_intel gf128mul crypto_simd drm_kms_helper cryptd wmi_bmof ee1004 rapl acpi_cpufreq pcspkr i2c_algo_bit ccp acpi_ipmi sp5100_tco k10temp watchdog button nft_masq ipmi_si ipmi_devintf ipmi_msghandler evdev joydev sg nfsd nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 auth_rpcgss nfs_acl lockd grace nf_tables sunrpc drm configfs efi_pstore nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 efivarfs raid10 raid0 hid_generic usbhid hid raid456 async_raid6_recov async_memcpy
> 2025-01-21T12:07:18.015622+01:00 $HOST kernel:  async_pq async_xor async_tx xor rndis_host cdc_ether usbnet mii raid6_pq libcrc32c crc32c_generic mlx5_ib ib_uverbs ib_core raid1 md_mod ses enclosure scsi_transport_sas sd_mod mlx5_core ahci libahci xhci_pci libata xhci_hcd megaraid_sas tg3 crc32_pclmul scsi_mod crc32c_intel mlxfw usbcore libphy pci_hyperv_intf scsi_common i2c_piix4 i2c_smbus usb_common wmi
> 2025-01-21T12:07:18.015624+01:00 $HOST kernel: CR2: 0000000000000090
> 2025-01-21T12:07:18.015625+01:00 $HOST kernel: ---[ end trace 0000000000000000 ]---
> 
> The used kernel version from the user is 6.12.9 based.
> 
> Does this ring a bell? Might 8e6e2ffa6569 ("nfsd: add list_head nf_gc
> to struct nfsd_file") be related?
> 



-- 
Jeff Layton <jlayton@xxxxxxxxxx>





[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux