On Tue, Jan 07, 2025 at 02:12:53PM +0200, Leon Romanovsky wrote: > From: Patrisious Haddad <phaddad@xxxxxxxxxx> > > Prevent double queueing of implicit ODP mr destroy work by adding a bit > to the MR indicating if this MR is already queued for destruction. > > Without this bit, we could try to invalidate this mr twice, which in > turn could result in queuing a MR work destroy twice, and eventually the > second work could execute after the MR was freed due to the first work, > causing a user after free and trace below. > > refcount_t: underflow; use-after-free. > WARNING: CPU: 2 PID: 12178 at lib/refcount.c:28 refcount_warn_saturate+0x12b/0x130 > Modules linked in: bonding ib_ipoib vfio_pci ip_gre geneve nf_tables ip6_gre gre ip6_tunnel tunnel6 ipip tunnel4 ib_umad rdma_ucm mlx5_vfio_pci vfio_pci_core vfio_iommu_type1 mlx5_ib vfio ib_uverbs mlx5_core iptable_raw openvswitch nsh rpcrdma ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_cm ib_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay zram zsmalloc fuse [last unloaded: ib_uverbs] > CPU: 2 PID: 12178 Comm: kworker/u20:5 Not tainted 6.5.0-rc1_net_next_mlx5_58c644e #1 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > Workqueue: events_unbound free_implicit_child_mr_work [mlx5_ib] > RIP: 0010:refcount_warn_saturate+0x12b/0x130 > Code: 48 c7 c7 38 95 2a 82 c6 05 bc c6 fe 00 01 e8 0c 66 aa ff 0f 0b 5b c3 48 c7 c7 e0 94 2a 82 c6 05 a7 c6 fe 00 01 e8 f5 65 aa ff <0f> 0b 5b c3 90 8b 07 3d 00 00 00 c0 74 12 83 f8 01 74 13 8d 50 ff > RSP: 0018:ffff8881008e3e40 EFLAGS: 00010286 > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 > RDX: ffff88852c91b5c8 RSI: 0000000000000001 RDI: ffff88852c91b5c0 > RBP: ffff8881dacd4e00 R08: 00000000ffffffff R09: 0000000000000019 > R10: 000000000000072e R11: 0000000063666572 R12: ffff88812bfd9e00 > R13: ffff8881c792d200 R14: ffff88810011c005 R15: ffff8881002099c0 > FS: 0000000000000000(0000) GS:ffff88852c900000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f5694b5e000 CR3: 00000001153f6003 CR4: 0000000000370ea0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > ? __warn+0x79/0x120 > ? refcount_warn_saturate+0x12b/0x130 > ? report_bug+0x17c/0x190 > ? handle_bug+0x3c/0x60 > ? exc_invalid_op+0x14/0x70 > ? asm_exc_invalid_op+0x16/0x20 > ? refcount_warn_saturate+0x12b/0x130 > free_implicit_child_mr_work+0x180/0x1b0 [mlx5_ib] > ? try_to_wake_up+0x5d/0x450 > ? destroy_sched_domains_rcu+0x30/0x30 > process_one_work+0x1cc/0x3c0 > worker_thread+0x218/0x3c0 > ? process_one_work+0x3c0/0x3c0 > kthread+0xc6/0xf0 > ? kthread_complete_and_exit+0x20/0x20 > ret_from_fork+0x1f/0x30 > </TASK> > ---[ end trace 0000000000000000 ]--- > > Fixes: 5256edcb98a1 ("RDMA/mlx5: Rework implicit ODP destroy") > Signed-off-by: Patrisious Haddad <phaddad@xxxxxxxxxx> > Reviewed-by: Michael Guralnik <michaelgur@xxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > --- > drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 ++ > drivers/infiniband/hw/mlx5/odp.c | 4 ++++ > 2 files changed, 6 insertions(+) I'm dropping this patch, need to rewrite it. Thanks