Kernel panic at nfs4_select_rw_stateid

Suresh Jayaraman <sjayaraman@xxxxxxxxxx> · Thu, 19 Nov 2020 11:02:53 +0000

Hi folks,

We have a report of an NFSv4.1 client crash in kernel version 4.18 (CentOS 8.2) with the IntelliFlash NFSv41 server. 

[95480.028498] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 
[95480.028550] PGD 0 P4D 0 
[95480.028570] Oops: 0000 [#1] SMP PTI 
[95480.028591] CPU: 37 PID: 504425 Comm: kworker/u98:1 Kdump: loaded Tainted: G OE --------- - - 4.18.0-193.14.2.el8_2.x86_64 #1 
[95480.028648] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/21/2019 
[95480.028717] Workqueue: rpciod rpc_async_schedule [sunrpc] 
[95480.028763] RIP: 0010:nfs4_select_rw_stateid+0x64/0x180 [nfsv4] 
[95480.028792] Code: 41 89 f6 48 89 fb 48 85 d2 0f 84 f6 00 00 00 48 8b 47 40 a8 01 0f 84 ea 00 00 00 48 8b 42 18 4c 8b 7a 20 4c 8d 67 48 4c 89 e7 <48> 8b 50 40 48 89 54 24 08 e8 4e 8e 77 f6 48 8b 54 24 08 4c 89 fe 
[95480.028877] RSP: 0018:ffffb4cd225a7dc0 EFLAGS: 00010202 
[95480.028903] RAX: 0000000000000010 RBX: ffff8dd23bee4180 RCX: ffff8dcfb54276c8 
[95480.028936] RDX: ffff8d5400b69980 RSI: 0000000000000001 RDI: ffff8dd23bee41c8 
[95480.028972] RBP: ffffb4cd225a7e00 R08: 0000000000000000 R09: 0000646f69637072 
[95480.029006] R10: 8080808080808080 R11: ffff8dd27f968bc0 R12: ffff8dd23bee41c8 
[95480.029040] R13: ffff8dcfb54276c8 R14: 0000000000000001 R15: ffff8dd24affa940 
[95480.029073] FS: 0000000000000000(0000) GS:ffff8dd27f940000(0000) knlGS:0000000000000000 
[95480.029111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[95480.029140] CR2: 0000000000000050 CR3: 0000005a5a60a005 CR4: 00000000001626e0 
[95480.029175] Call Trace: 
[95480.029206] ? nfs4_setup_sequence+0x36/0x190 [nfsv4] 
[95480.029246] ? rpc_exit+0x30/0x30 [sunrpc] 
[95480.029280] ? __rpc_atrun+0x20/0x20 [sunrpc] 
[95480.029314] nfs4_proc_pgio_rpc_prepare+0x5c/0x80 [nfsv4] 
[95480.029357] nfs_pgio_prepare+0x2d/0x40 [nfs] 
[95480.029390] __rpc_execute+0x85/0x340 [sunrpc] 
[95480.029418] ? finish_task_switch+0xd7/0x2b0 
[95480.029454] rpc_async_schedule+0x29/0x40 [sunrpc] 
[95480.029482] process_one_work+0x1a7/0x3b0 
[95480.029504] worker_thread+0x30/0x390 
[95480.029524] ? create_worker+0x1a0/0x1a0 
[95480.029547] kthread+0x112/0x130 
[95480.029567] ? kthread_flush_work_fn+0x10/0x10 
[95480.029593] ret_from_fork+0x35/0x40 
[95480.029615] Modules linked in: vport_stt(OE) vhost_net vhost tap act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb tun rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache binfmt_misc xt_CT nft_counter nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib ip6_tables ip_tables nft_compat nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set openvswitch(OE) nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_nat_ipv6 nf_defrag_ipv6 udp_tunnel nft_chain_route_ipv6 nft_chain_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack nft_chain_route_ipv4 ip_set nf_tables nfnetlink sunrpc intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp vfat fat kvm_intel kvm irqbypass ipmi_ssif iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate intel_uncore ipmi_si intel_rapl_perf ipmi_devintf i2c_i801 ioatdma pcspkr ipmi_msghandler lpc_ich hpwdt hpilo dca acpi_tad wmi acpi_power_meter ext4 
[95480.029655] mbcache jbd2 sd_mod sg mgag200 drm_vram_helper ttm bnx2x i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm mdio hpsa(O) libcrc32c scsi_transport_sas crc32c_intel dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ip_tables] 
[95480.032776] CR2: 0000000000000050

[root@centos8-vm3 ~]# gdb /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.x86_64/kernel/fs/nfs/nfsv4.ko.debug
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-11.el8
...
Reading symbols from /usr/lib/debug/usr/lib/modules/4.18.0-193.el8.x86_64/kernel/fs/nfs/nfsv4.ko.debug...done.
(gdb) list *(nfs4_select_rw_stateid+0x64)
0x200f4 is in nfs4_select_rw_stateid (fs/nfs/nfs4state.c:998).
993
994		if (test_bit(LK_STATE_IN_USE, &state->flags) == 0)
995			goto out;
996
997		fl_owner = l_ctx->lockowner;
998		fl_flock_owner = l_ctx->open_context->flock_owner;   <================ HERE
999
1000		spin_lock(&state->state_lock);
1001		lsp = __nfs4_find_lock_state(state, fl_owner, fl_flock_owner);
1002		if (lsp && test_bit(NFS_LOCK_LOST, &lsp->ls_flags)) 

It appears that l_ctx->open_context becomes invalid prematurely resulting in a NULL pointer dereference. nfs4_copy_lock_stateid() already ensured that l_ctx (lock context) is not null.

Looking at the NFS client changelog the following fix looked relevant, but 4.18 already includes this fix. Looking further, it doesn't look like the issue is fixed in mainline.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/nfs/inode.c?id=154945112dac10b7109d816275f3e4896b0b064e
author	Trond Myklebust <trondmy@xxxxxxxxx>
NFS: Ensure that all nfs lock contexts have a valid open context
Force the lock context to keep a reference to the parent open context so that we can guarantee the validity of the latter.

This issue was seen multiple times in a virtualized setup that is running multiple VMs on KVM and hard to reproduce at will.
Is this a known issue fixed in recent kernels? Any other debugging information that could help to track down this issue?

Thanks,
Suresh