Re: Kernel panic at nfs4_select_rw_stateid

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2020-11-19 at 11:02 +0000, Suresh Jayaraman wrote:
> Hi folks,
> 
> We have a report of an NFSv4.1 client crash in kernel version 4.18
> (CentOS 8.2) with the IntelliFlash NFSv41 server. 
> 
> [95480.028498] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000050 
> [95480.028550] PGD 0 P4D 0 
> [95480.028570] Oops: 0000 [#1] SMP PTI 
> [95480.028591] CPU: 37 PID: 504425 Comm: kworker/u98:1 Kdump: loaded
> Tainted: G OE --------- - - 4.18.0-193.14.2.el8_2.x86_64 #1 
> [95480.028648] Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380
> Gen9, BIOS P89 10/21/2019 
> [95480.028717] Workqueue: rpciod rpc_async_schedule [sunrpc] 
> [95480.028763] RIP: 0010:nfs4_select_rw_stateid+0x64/0x180 [nfsv4] 
> [95480.028792] Code: 41 89 f6 48 89 fb 48 85 d2 0f 84 f6 00 00 00 48
> 8b 47 40 a8 01 0f 84 ea 00 00 00 48 8b 42 18 4c 8b 7a 20 4c 8d 67 48
> 4c 89 e7 <48> 8b 50 40 48 89 54 24 08 e8 4e 8e 77 f6 48 8b 54 24 08
> 4c 89 fe 
> [95480.028877] RSP: 0018:ffffb4cd225a7dc0 EFLAGS: 00010202 
> [95480.028903] RAX: 0000000000000010 RBX: ffff8dd23bee4180 RCX:
> ffff8dcfb54276c8 
> [95480.028936] RDX: ffff8d5400b69980 RSI: 0000000000000001 RDI:
> ffff8dd23bee41c8 
> [95480.028972] RBP: ffffb4cd225a7e00 R08: 0000000000000000 R09:
> 0000646f69637072 
> [95480.029006] R10: 8080808080808080 R11: ffff8dd27f968bc0 R12:
> ffff8dd23bee41c8 
> [95480.029040] R13: ffff8dcfb54276c8 R14: 0000000000000001 R15:
> ffff8dd24affa940 
> [95480.029073] FS: 0000000000000000(0000) GS:ffff8dd27f940000(0000)
> knlGS:0000000000000000 
> [95480.029111] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
> [95480.029140] CR2: 0000000000000050 CR3: 0000005a5a60a005 CR4:
> 00000000001626e0 
> [95480.029175] Call Trace: 
> [95480.029206] ? nfs4_setup_sequence+0x36/0x190 [nfsv4] 
> [95480.029246] ? rpc_exit+0x30/0x30 [sunrpc] 
> [95480.029280] ? __rpc_atrun+0x20/0x20 [sunrpc] 
> [95480.029314] nfs4_proc_pgio_rpc_prepare+0x5c/0x80 [nfsv4] 
> [95480.029357] nfs_pgio_prepare+0x2d/0x40 [nfs] 
> [95480.029390] __rpc_execute+0x85/0x340 [sunrpc] 
> [95480.029418] ? finish_task_switch+0xd7/0x2b0 
> [95480.029454] rpc_async_schedule+0x29/0x40 [sunrpc] 
> [95480.029482] process_one_work+0x1a7/0x3b0 
> [95480.029504] worker_thread+0x30/0x390 
> [95480.029524] ? create_worker+0x1a0/0x1a0 
> [95480.029547] kthread+0x112/0x130 
> [95480.029567] ? kthread_flush_work_fn+0x10/0x10 
> [95480.029593] ret_from_fork+0x35/0x40 
> [95480.029615] Modules linked in: vport_stt(OE) vhost_net vhost tap
> act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb tun
> rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace
> fscache binfmt_misc xt_CT nft_counter nft_fib_inet nft_fib_ipv4
> nft_fib_ipv6 nft_fib ip6_tables ip_tables nft_compat nft_reject_inet
> nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nf_tables_set
> openvswitch(OE) nft_chain_nat_ipv6 nf_conntrack_ipv6 nf_nat_ipv6
> nf_defrag_ipv6 udp_tunnel nft_chain_route_ipv6 nft_chain_nat_ipv4
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
> nft_chain_route_ipv4 ip_set nf_tables nfnetlink sunrpc intel_rapl_msr
> intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp
> coretemp vfat fat kvm_intel kvm irqbypass ipmi_ssif iTCO_wdt
> iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> intel_cstate intel_uncore ipmi_si intel_rapl_perf ipmi_devintf
> i2c_i801 ioatdma pcspkr ipmi_msghandler lpc_ich hpwdt hpilo dca
> acpi_tad wmi acpi_power_meter ext4 
> [95480.029655] mbcache jbd2 sd_mod sg mgag200 drm_vram_helper ttm
> bnx2x i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops drm mdio hpsa(O) libcrc32c scsi_transport_sas
> crc32c_intel dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
> ip_tables] 
> [95480.032776] CR2: 0000000000000050
> 
> [root@centos8-vm3 ~]# gdb /usr/lib/debug/usr/lib/modules/4.18.0-
> 193.el8.x86_64/kernel/fs/nfs/nfsv4.ko.debug
> GNU gdb (GDB) Red Hat Enterprise Linux 8.2-11.el8
> ...
> Reading symbols from /usr/lib/debug/usr/lib/modules/4.18.0-
> 193.el8.x86_64/kernel/fs/nfs/nfsv4.ko.debug...done.
> (gdb) list *(nfs4_select_rw_stateid+0x64)
> 0x200f4 is in nfs4_select_rw_stateid (fs/nfs/nfs4state.c:998).
> 993
> 994             if (test_bit(LK_STATE_IN_USE, &state->flags) == 0)
> 995                     goto out;
> 996
> 997             fl_owner = l_ctx->lockowner;
> 998             fl_flock_owner = l_ctx->open_context->flock_owner;  
> <================ HERE
> 999
> 1000            spin_lock(&state->state_lock);
> 1001            lsp = __nfs4_find_lock_state(state, fl_owner,
> fl_flock_owner);
> 1002            if (lsp && test_bit(NFS_LOCK_LOST, &lsp->ls_flags)) 
> 
> It appears that l_ctx->open_context becomes invalid prematurely
> resulting in a NULL pointer dereference. nfs4_copy_lock_stateid()
> already ensured that l_ctx (lock context) is not null.
> 
> Looking at the NFS client changelog the following fix looked
> relevant, but 4.18 already includes this fix. Looking further, it
> doesn't look like the issue is fixed in mainline.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/fs/nfs/inode.c?id=154945112dac10b7109d816275f3e4896b0b064e
> author  Trond Myklebust <trondmy@xxxxxxxxx>
> NFS: Ensure that all nfs lock contexts have a valid open context
> Force the lock context to keep a reference to the parent open context
> so that we can guarantee the validity of the latter.
> 
> This issue was seen multiple times in a virtualized setup that is
> running multiple VMs on KVM and hard to reproduce at will.
> Is this a known issue fixed in recent kernels? Any other debugging
> information that could help to track down this issue?
> 
> Thanks,
> Suresh


Have you applied commit 09a54f0ebfe2 ("NFSv4: Fix an Oops in
nfs4_do_setattr")?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux