On Fri, Jan 14, 2022 at 03:18:01PM +0000, Chuck Lever III wrote: > Hi Jonathan- > > > On Jan 14, 2022, at 5:39 AM, Jonathan Woithe <jwoithe@xxxxxxxxxx> wrote: > > > > Hi all > > > > Recently we migrated an NFS server from a 32-bit environment running > > kernel 4.14.128 to a 64-bit 5.15.x kernel. The NFS configuration remained > > unchanged between the two systems. > > > > On two separate occasions since the upgrade (5 Jan under 5.15.10, 14 Jan > > under 5.15.12) the kernel has oopsed at around the time that an NFS client > > machine is turned on for the day. On both occasions the call trace was > > essentially identical. The full oops sequence is at the end of this email. > > The oops was not observed when running the 4.14.128 kernel. > > > > Is there anything more I can provide to help track down the cause of the > > oops? > > A possible culprit is 7f024fcd5c97 ("Keep read and write fds with each nlm_file"), > which was introduced in or around v5.15. Almost definitely it, yeah. We should really have nlm reboot tests. I test nlm and v4 reboot but not nlm reboot.... > You could try a simple test and back > the server down to v5.14.y to see if the problem persists. > > Otherwise, Bruce, can you have a look at this? Yep, just catching up.... Given my lack of nlm reboot testing (sorry) I wouldn't be suprised if it's reproduceable with something really simple, like: take a lock, then restart the client (so that it notifies the server). Could still be rare in production if rebooting while holding a lock is rare. --b. > > > > Regards > > jonathan > > > > Oops under 5.15.12: > > > > Jan 14 08:48:30 nfssvr kernel: BUG: kernel NULL pointer dereference, address: 0000000000000110 > > Jan 14 08:48:30 nfssvr kernel: #PF: supervisor read access in kernel mode > > Jan 14 08:48:30 nfssvr kernel: #PF: error_code(0x0000) - not-present page > > Jan 14 08:48:30 nfssvr kernel: Oops: 0000 [#1] PREEMPT SMP PTI > > Jan 14 08:48:30 nfssvr kernel: CPU: 0 PID: 2935 Comm: lockd Not tainted 5.15.12 #1 > > Jan 14 08:48:30 nfssvr kernel: Hardware name: /DG31PR, BIOS PRG3110H.86A.0038.2007.1221.1757 12/21/2007 > > Jan 14 08:48:30 nfssvr kernel: RIP: 0010:vfs_lock_file+0x5/0x30 > > Jan 14 08:48:30 nfssvr kernel: Code: ff ff 41 89 c4 85 c0 0f 84 42 ff ff ff e9 f8 fe ff ff 0f 0b e8 2c bc d2 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 <48> 8b 47 28 49 89 d0 48 8b 80 98 00 00 00 48 85 c0 74 05 e9 f3 dc > > Jan 14 08:48:30 nfssvr kernel: RSP: 0018:ffffa478401a3c38 EFLAGS: 00010246 > > Jan 14 08:48:30 nfssvr kernel: RAX: 7fffffffffffffff RBX: 00000000000000e8 RCX: 0000000000000000 > > Jan 14 08:48:30 nfssvr kernel: RDX: ffffa478401a3c40 RSI: 0000000000000006 RDI: 00000000000000e8 > > Jan 14 08:48:30 nfssvr kernel: RBP: ffff946ead1ecc00 R08: ffff946f88ab1000 R09: ffff946f88b33a00 > > Jan 14 08:48:30 nfssvr kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa657ff30 > > Jan 14 08:48:30 nfssvr kernel: R13: ffff946e99df7c40 R14: ffff946e82fb0510 R15: ffff946ead1ecc00 > > Jan 14 08:48:30 nfssvr kernel: FS: 0000000000000000(0000) GS:ffff946fabc00000(0000) knlGS:0000000000000000 > > Jan 14 08:48:30 nfssvr kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Jan 14 08:48:30 nfssvr kernel: CR2: 0000000000000110 CR3: 000000010083a000 CR4: 00000000000006f0 > > Jan 14 08:48:30 nfssvr kernel: Call Trace: > > Jan 14 08:48:30 nfssvr kernel: <TASK> > > Jan 14 08:48:30 nfssvr kernel: nlm_unlock_files+0x6e/0xb0 > > Jan 14 08:48:30 nfssvr kernel: ? __skb_recv_udp+0x198/0x330 > > Jan 14 08:48:30 nfssvr kernel: ? _raw_spin_lock+0x13/0x2e > > Jan 14 08:48:30 nfssvr kernel: ? nlmsvc_traverse_blocks+0x36/0x120 > > Jan 14 08:48:30 nfssvr kernel: ? preempt_count_add+0x68/0xa0 > > Jan 14 08:48:30 nfssvr kernel: nlm_traverse_files+0x152/0x280 > > Jan 14 08:48:30 nfssvr kernel: nlmsvc_free_host_resources+0x27/0x40 > > Jan 14 08:48:30 nfssvr kernel: nlm_host_rebooted+0x23/0x90 > > Jan 14 08:48:30 nfssvr kernel: nlmsvc_proc_sm_notify+0xae/0x110 > > Jan 14 08:48:30 nfssvr kernel: ? nlmsvc_decode_reboot+0x8b/0xc0 > > Jan 14 08:48:30 nfssvr kernel: nlmsvc_dispatch+0x89/0x180 > > Jan 14 08:48:30 nfssvr kernel: svc_process_common+0x3ce/0x6f0 > > Jan 14 08:48:30 nfssvr kernel: ? lockd_inet6addr_event+0xf0/0xf0 > > Jan 14 08:48:30 nfssvr kernel: svc_process+0xb7/0xf0 > > Jan 14 08:48:30 nfssvr kernel: lockd+0xca/0x1b0 > > Jan 14 08:48:30 nfssvr kernel: ? preempt_count_add+0x68/0xa0 > > Jan 14 08:48:30 nfssvr kernel: ? _raw_spin_lock_irqsave+0x19/0x40 > > Jan 14 08:48:30 nfssvr kernel: ? set_grace_period+0x90/0x90 > > Jan 14 08:48:30 nfssvr kernel: kthread+0x141/0x170 > > Jan 14 08:48:30 nfssvr kernel: ? set_kthread_struct+0x40/0x40 > > Jan 14 08:48:30 nfssvr kernel: ret_from_fork+0x22/0x30 > > Jan 14 08:48:30 nfssvr kernel: </TASK> > > Jan 14 08:48:30 nfssvr kernel: Modules linked in: tun nf_nat_ftp nf_conntrack_ftp xt_REDIRECT xt_nat xt_conntrack xt_tcpudp xt_NFLOG nfnetlink_log nfnetlink iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter ip_tables x_tables ipv6 hid_generic usbhid hi > > Jan 14 08:48:30 nfssvr kernel: CR2: 0000000000000110 > > Jan 14 08:48:30 nfssvr kernel: ---[ end trace f8f28acee6f24340 ]--- > > Jan 14 08:48:30 nfssvr kernel: RIP: 0010:vfs_lock_file+0x5/0x30 > > Jan 14 08:48:30 nfssvr kernel: Code: ff ff 41 89 c4 85 c0 0f 84 42 ff ff ff e9 f8 fe ff ff 0f 0b e8 2c bc d2 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 0f 1f 44 00 00 <48> 8b 47 28 49 89 d0 48 8b 80 98 00 00 00 48 85 c0 74 05 e9 f3 dc > > Jan 14 08:48:30 nfssvr kernel: RSP: 0018:ffffa478401a3c38 EFLAGS: 00010246 > > Jan 14 08:48:30 nfssvr kernel: RAX: 7fffffffffffffff RBX: 00000000000000e8 RCX: 0000000000000000 > > Jan 14 08:48:30 nfssvr kernel: RDX: ffffa478401a3c40 RSI: 0000000000000006 RDI: 00000000000000e8 > > Jan 14 08:48:30 nfssvr kernel: RBP: ffff946ead1ecc00 R08: ffff946f88ab1000 R09: ffff946f88b33a00 > > Jan 14 08:48:30 nfssvr kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa657ff30 > > Jan 14 08:48:30 nfssvr kernel: R13: ffff946e99df7c40 R14: ffff946e82fb0510 R15: ffff946ead1ecc00 > > Jan 14 08:48:30 nfssvr kernel: FS: 0000000000000000(0000) GS:ffff946fabc00000(0000) knlGS:0000000000000000 > > Jan 14 08:48:30 nfssvr kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > Jan 14 08:48:30 nfssvr kernel: CR2: 0000000000000110 CR3: 000000010083a000 CR4: 00000000000006f0 > > -- > Chuck Lever > >