> On Jan 18, 2023, at 12:18 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Wed, 2023-01-18 at 16:48 +0000, Shachar Kagan wrote: >> On Wend, 2023-01-18 at 18:45 +0000, Chuck Lever III wrote: >> >>> On Tue, 2023-01-17 at 15:22 +0000, Chuck Lever III wrote: >>>> >>>>> On Jan 17, 2023, at 10:16 AM, Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: >>>>> >>>>> On Tue, Nov 01, 2022 at 10:46:45AM -0400, Jeff Layton wrote: >>>>>> The filecache refcounting is a bit non-standard for something >>>>>> searchable by RCU, in that we maintain a sentinel reference while >>>>>> it's hashed. This in turn requires that we have to do things differently in the "put" >>>>>> depending on whether its hashed, which we believe to have led to races. >>>>>> >>>>>> There are other problems in here too. nfsd_file_close_inode_sync >>>>>> can end up freeing an nfsd_file while there are still outstanding >>>>>> references to it, and there are a number of subtle ToC/ToU races. >>>>>> >>>>>> Rework the code so that the refcount is what drives the lifecycle. >>>>>> When the refcount goes to zero, then unhash and rcu free the object. >>>>>> >>>>>> With this change, the LRU carries a reference. Take special care >>>>>> to deal with it when removing an entry from the list. >>>>>> >>>>>> Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> >>>>> >>>>> Our test team is getting crashes that bisection pointed at this >>>>> patch. It seems like there are multiple parallel crash reports so >>>>> the whole thing is a mess to read: >>>>> >>>>> [ 875.548965] BUG: kernel NULL pointer dereference, address: >>>>> 00000000000000d0 [ 875.548968] ------------[ cut here ]------------ >>>>> [ 875.548972] refcount_t: underflow; use-after-free. >>>>> [ 875.548992] WARNING: CPU: 4 PID: 12145 at lib/refcount.c:28 >>>>> refcount_warn_saturate+0xd8/0xe0 [ 875.549851] #PF: supervisor read >>>>> access in kernel mode [ 875.550158] Modules linked in: >>>>> [ 875.550752] #PF: error_code(0x0000) - not-present page [ >>>>> 875.551269] nfsd [ 875.551878] PGD 0 [ 875.552069] iptable_raw [ >>>>> 875.552677] P4D 0 [ 875.552824] bonding mlx5_vfio_pci [ >>>>> 875.553095] [ 875.553255] rdma_ucm ipip [ 875.553525] Oops: 0000 >>>>> [#1] SMP [ 875.553733] tunnel4 [ 875.553941] CPU: 0 PID: 12147 >>>>> Comm: nfsd Not tainted 6.1.0-rc7_ac3a2585f018 #1 [ 875.554109] >>>>> ip_gre ib_umad [ 875.554517] Hardware name: QEMU Standard PC (Q35 + >>>>> ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org >>>>> 04/01/2014 [ 875.554656] nf_tables vfio_pci [ 875.555508] RIP: >>>>> 0010:vfs_setlease+0x27/0x70 [ 875.555695] vfio_pci_core >>>>> vfio_virqfd [ 875.557015] Code: ff ff 90 0f 1f 44 00 00 41 54 49 89 >>>>> d4 55 48 89 fd 48 83 ec 10 48 85 d2 74 06 48 83 fe 02 75 1f 48 8b 45 >>>>> 28 4c 89 e2 48 89 ef <48> 8b 80 d0 00 00 00 48 85 c0 74 2c 48 83 c4 >>>>> 10 5d 41 5c ff e0 48 [ 875.557209] vfio_iommu_type1 [ 875.557406] >>>>> RSP: 0018:ffff88810378bdb0 EFLAGS: 00010246 [ 875.557634] mlx5_ib >>>>> [ 875.558446] [ 875.558628] vfio [ 875.558862] RAX: >>>>> 0000000000000000 RBX: ffff88824866c000 RCX: ffff88810378bdd8 [ >>>>> 875.559006] ib_uverbs [ 875.559092] RDX: 0000000000000000 RSI: >>>>> 0000000000000002 RDI: ffff88812560a200 [ 875.559218] ib_ipoib [ >>>>> 875.559557] RBP: ffff88812560a200 R08: ffff8881da5ecf00 R09: >>>>> ffffffff824064e0 [ 875.559704] mlx5_core [ 875.560021] R10: >>>>> 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ >>>>> 875.560165] ip6_gre [ 875.560488] R13: ffff8881da5ecf00 R14: >>>>> ffff888110e62028 R15: ffff888110e621a0 [ 875.560634] gre [ >>>>> 875.560959] FS: 0000000000000000(0000) GS:ffff88852c800000(0000) >>>>> knlGS:0000000000000000 [ 875.561108] ip6_tunnel [ 875.561432] CS: >>>>> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 875.561554] tunnel6 >>>>> [ 875.561928] CR2: 00000000000000d0 CR3: 00000001ca27d001 CR4: >>>>> 0000000000372eb0 [ 875.562084] geneve [ 875.562349] DR0: >>>>> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ >>>>> 875.562493] nfnetlink_cttimeout [ 875.562822] DR3: >>>>> 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ >>>>> 875.562962] openvswitch [ 875.563292] Call Trace: >>>>> [ 875.563298] <TASK> >>>>> [ 875.563503] nsh >>>>> [ 875.563839] destroy_unhashed_deleg+0x58/0xc0 [nfsd] >>>> >>>> We are aware of this failure mode. Actually this started well before >>>> that particular commit. >>>> >>>> Our problem has been that no one has been able to provide a reliable >>>> reproducer, so we can't figure out why it's happening. If you have a >>>> way to reproduce this failure reliably, can you capture a vmcore or >>>> enable KASAN and get a little more information? >>>> >>> >>> It's possible that this crash may be related to the problem that was fixed here: >>> >>> commit 0b3a551fa58b4da941efeb209b3770868e2eddd7 >>> Author: Jeff Layton <jlayton@xxxxxxxxxx> >>> Date: Thu Jan 5 14:55:56 2023 -0500 >>> >>> nfsd: fix handling of cached open files in nfsd4_open codepath >>> >>> Unfortunately, that hasn't trickled into v6.1 kernels so far. >> >> This commit is in my working tree, but this commit doesn't fix the issue since I still face the crash. >> We are working on v6.2-rc3 > > Thanks for testing it. That patch fixes a real bug, just not the one > you're hitting apparently. > > If you're comfortable working with bleeding edge kernels, you may just > want to pull in Chuck's for-rc and for-next branches. Stephen and I renamed these yesterday to nfsd-fixes and nfsd-next, respectively. nfsd-fixes was pulled into v6.2-rc yesterday, fyi. > Those have a few > other patches that I wouldn't expect to change this, but might still be > worth testing to see. > > If it's happening regularly, you could also try disabling leases on the > machine, at the expense of some performance. We suspect this is related > to delegation handling, but we just haven't been able to nail it down > yet. If you do that, and it seems to fix it for you, let us know as that > would be an interesting datapoint. > > Thanks! > -- > Jeff Layton <jlayton@xxxxxxxxxx> -- Chuck Lever