On 8/21/23 3:57 AM, Leon Romanovsky wrote: > From: Leon Romanovsky <leonro@xxxxxxxxxx> > > Commit: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") is > causing problems on OPA when DEVICE_REMOVAL is happening. > > ------------[ cut here ]------------ > WARNING: CPU: 52 PID: 2117247 at drivers/infiniband/core/cq.c:359 > ib_cq_pool_cleanup+0xac/0xb0 [ib_core] > Modules linked in: nfsd nfs_acl target_core_user uio tcm_fc libfc > scsi_transport_fc tcm_loop target_core_pscsi target_core_iblock target_core_file > rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs > rfkill rpcrdma rdma_ucm ib_srpt sunrpc ib_isert iscsi_target_mod target_core_mod > opa_vnic ib_iser libiscsi ib_umad scsi_transport_iscsi rdma_cm ib_ipoib iw_cm > ib_cm hfi1(-) rdmavt ib_uverbs intel_rapl_msr intel_rapl_common sb_edac ib_core > x86_pkg_temp_thermal intel_powerclamp coretemp i2c_i801 mxm_wmi rapl iTCO_wdt > ipmi_si iTCO_vendor_support mei_me ipmi_devintf mei intel_cstate ioatdma > intel_uncore i2c_smbus joydev pcspkr lpc_ich ipmi_msghandler acpi_power_meter > acpi_pad xfs libcrc32c sr_mod sd_mod cdrom t10_pi sg crct10dif_pclmul > crc32_pclmul crc32c_intel drm_kms_helper drm_shmem_helper ahci libahci > ghash_clmulni_intel igb drm libata dca i2c_algo_bit wmi fuse > CPU: 52 PID: 2117247 Comm: modprobe Not tainted 6.5.0-rc1+ #1 > Hardware name: Intel Corporation S2600CWR/S2600CW, BIOS > SE5C610.86B.01.01.0014.121820151719 12/18/2015 > RIP: 0010:ib_cq_pool_cleanup+0xac/0xb0 [ib_core] > Code: ff 48 8b 43 40 48 8d 7b 40 48 83 e8 40 4c 39 e7 75 b3 49 83 > c4 10 4d 39 fc 75 94 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc <0f> 0b eb a1 > 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f > RSP: 0018:ffffc10bea13fc80 EFLAGS: 00010206 > RAX: 000000000000010c RBX: ffff9bf5c7e66c00 RCX: 000000008020001d > RDX: 000000008020001e RSI: fffff175221f9900 RDI: ffff9bf5c7e67640 > RBP: ffff9bf5c7e67600 R08: ffff9bf5c7e64400 R09: 000000008020001d > R10: 0000000040000000 R11: 0000000000000000 R12: ffff9bee4b1e8a18 > R13: dead000000000122 R14: dead000000000100 R15: ffff9bee4b1e8a38 > FS: 00007ff1e6d38740(0000) GS:ffff9bfd9fb00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00005652044ecc68 CR3: 0000000889b5c005 CR4: 00000000001706e0 > Call Trace: > <TASK> > ? __warn+0x80/0x130 > ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core] > ? report_bug+0x195/0x1a0 > ? handle_bug+0x3c/0x70 > ? exc_invalid_op+0x14/0x70 > ? asm_exc_invalid_op+0x16/0x20 > ? ib_cq_pool_cleanup+0xac/0xb0 [ib_core] > disable_device+0x9d/0x160 [ib_core] > __ib_unregister_device+0x42/0xb0 [ib_core] > ib_unregister_device+0x22/0x30 [ib_core] > rvt_unregister_device+0x20/0x90 [rdmavt] > hfi1_unregister_ib_device+0x16/0xf0 [hfi1] > remove_one+0x55/0x1a0 [hfi1] > pci_device_remove+0x36/0xa0 > device_release_driver_internal+0x193/0x200 > driver_detach+0x44/0x90 > bus_remove_driver+0x69/0xf0 > pci_unregister_driver+0x2a/0xb0 > hfi1_mod_cleanup+0xc/0x3c [hfi1] > __do_sys_delete_module.constprop.0+0x17a/0x2f0 > ? exit_to_user_mode_prepare+0xc4/0xd0 > ? syscall_trace_enter.constprop.0+0x126/0x1a0 > do_syscall_64+0x5c/0x90 > ? syscall_exit_to_user_mode+0x12/0x30 > ? do_syscall_64+0x69/0x90 > ? syscall_exit_work+0x103/0x130 > ? syscall_exit_to_user_mode+0x12/0x30 > ? do_syscall_64+0x69/0x90 > ? exc_page_fault+0x65/0x150 > entry_SYSCALL_64_after_hwframe+0x6e/0xd8 > RIP: 0033:0x7ff1e643f5ab > Code: 73 01 c3 48 8b 0d 75 a8 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 > 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0 00 00 00 0f 05 <48> 3d 01 f0 > ff ff 73 01 c3 48 8b 0d 45 a8 1b 00 f7 d8 64 89 01 48 > RSP: 002b:00007ffec9103cc8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0 > RAX: ffffffffffffffda RBX: 00005615267fdc50 RCX: 00007ff1e643f5ab > RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005615267fdcb8 > RBP: 00005615267fdc50 R08: 0000000000000000 R09: 0000000000000000 > R10: 00007ff1e659eac0 R11: 0000000000000206 R12: 00005615267fdcb8 > R13: 0000000000000000 R14: 00005615267fdcb8 R15: 00007ffec9105ff8 > </TASK> > ---[ end trace 0000000000000000 ]--- > > And... > > restrack: ------------[ cut here ]------------ > infiniband hfi1_0: BUG: RESTRACK detected leak of resources > restrack: Kernel PD object allocated by ib_isert is not freed > restrack: Kernel CQ object allocated by ib_core is not freed > restrack: Kernel QP object allocated by rdma_cm is not freed > restrack: ------------[ cut here ]------------ > > Fixes: 699826f4e30a ("IB/isert: Fix incorrect release of isert connection") > Reported-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx> > Closes: https://lore.kernel.org/all/921cd1d9-2879-f455-1f50-0053fe6a6655@xxxxxxxxxxxxxxxxxxxx > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > --- > drivers/infiniband/ulp/isert/ib_isert.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c > index 92e1e7587af8..00a7303c8cc6 100644 > --- a/drivers/infiniband/ulp/isert/ib_isert.c > +++ b/drivers/infiniband/ulp/isert/ib_isert.c > @@ -2570,6 +2570,8 @@ static void isert_wait_conn(struct iscsit_conn *conn) > isert_put_unsol_pending_cmds(conn); > isert_wait4cmds(conn); > isert_wait4logout(isert_conn); > + > + queue_work(isert_release_wq, &isert_conn->release_work); > } > > static void isert_free_conn(struct iscsit_conn *conn) Tested-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxxxxxxxxxxxxx>