On 2/2/2017 7:09 AM, Raju Rangoju wrote:
Hi Max,
I have tried the patch, but no luck. Issue is still seen.
-Raju
Thanks Raju.
I sent it anyway because this is the right behaviour.
We'll continue investigating this issue.
-----Original Message-----
From: Max Gurtovoy [mailto:maxg@xxxxxxxxxxxx]
Sent: 01 February 2017 20:48
To: Raju Rangoju <rajur@xxxxxxxxxxx>; Sagi Grimberg <sagi@xxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx
Cc: SWise OGC <swise@xxxxxxxxxxxxxxxxxxxxx>; Potnuri Bharat Teja <bharat@xxxxxxxxxxx>
Subject: Re: iSER fails to release rdma resources (WRs) if iw_cxgb4 is unloaded while IO is in progress
hi Raju,
please apply the attached patch I want to push soon (still haven't find the chance to test it).
I'm not sure it will solve your problem but let's try it.
thanks,
Max.
On 2/1/2017 11:08 AM, Raju Rangoju wrote:
Hello Sagi,
I intermittently see an issue with iser when unloading the iw_cxgb4 module while traffic is running. Apparently the rdma resources are not getting released when the iser receives RDMA_CM_EVENT_DEVICE_REMOVAL event while the IO in progress. iser_cma_handler() upon receiving the DEVICE_REMOVAL event, destroys the device by calling iser_cleanup_handler(). iser_free_ib_conn_res() destroys the qp and calls iser_free_fastreg_pool() to free the Memory Regions in the fastreg_pool list, and then it calls ib_dealloc_pd.
Issue:
iSCSI uses its .xmit_task and .cleanup_task callbacks to get/put MRs from iser fr_pool(fastreg_pool) during the normal IO, at this point if the DEVICE_REMOVAL event is received, iser_cma_handler()->iser_cleanup_handler() it simply releases the available MRs in the fr_pool list (some MRs may have been moved to running task list) and eventually calls ib_dealloc_pd, which ends up hitting kernel panic as some registered MRs are not freed up.
iser_free_fastreg_pool() complains about the registered regions; "pool still has %d regions registered"
Trace:
iser: iser_free_fastreg_pool: pool still has 1 regions registered
iser: iser_device_try_release: device ffff880508660080 refcount 0
iw_cxgb4:c4iw_destroy_cq ib_cq ffff8803f3addc00
iw_cxgb4:c4iw_wait_for_reply add wr_waitp ffffc9000dd83a28
------------[ cut here ]------------
WARNING: CPU: 7 PID: 14790 at drivers/infiniband/core/verbs.c:305
ib_dealloc_pd+0x87/0xd0 [ib_core] Modules linked in: rdma_ucm ib_uverbs iw_cxgb4(OE-) autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod bnx2fc fcoe libfcoe 8021q libfc garp stp llc scsi_transport_fc cpufreq_ondemand be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb3i libcxgbi iw_cxgb3 cxgb3 mdio libcxgb ib_iser rdma_cm ib_cm iw_cm ib_core configfs ipv6 crc_ccitt iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi uinput ppdev iTCO_wdt iTCO_vendor_support serio_raw pcspkr parport_pc parport tpm_infineon sg i2c_i801 i2c_core lpc_ich mfd_core e1000e acpi_cpufreq i7core_edac edac_core ioatdma dca ext4(E) mbcache(E) jbd2(E) sd_mod(E) pata_acpi(E) ata_generic(E) ata_piix(E) floppy(E) cxgb4(OE) ptp(E) pps_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
CPU: 7 PID: 14790 Comm: rmmod Tainted: G OE 4.10.0-rc4+ #22
Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0 07/29/10
Call Trace:
dump_stack+0x51/0x78
__warn+0xfd/0x120
warn_slowpath_null+0x1d/0x20
ib_dealloc_pd+0x87/0xd0 [ib_core]
? ib_unregister_event_handler+0x6d/0x80 [ib_core] ?
mutex_lock+0x16/0x40
iser_device_try_release+0x81/0x120 [ib_iser] ?
iser_free_rx_descriptors+0xd3/0xf0 [ib_iser]
iser_free_ib_conn_res+0x75/0xb0 [ib_iser]
iser_cleanup_handler+0x41/0x70 [ib_iser]
iser_cma_handler+0x1c9/0x220 [ib_iser]
cma_remove_id_dev+0x8f/0xa0 [rdma_cm]
cma_process_remove+0x127/0x170 [rdma_cm] ? kobject_cleanup+0x82/0x1b0
? kobject_release+0xd/0x10
cma_remove_one+0x6f/0x90 [rdma_cm]
ib_unregister_device+0xe7/0x190 [ib_core]
c4iw_unregister_device+0x79/0x90 [iw_cxgb4] c4iw_remove+0x45/0x6c
[iw_cxgb4]
c4iw_exit_module+0x31/0x75 [iw_cxgb4]
SyS_delete_module+0x183/0x1d0
? syscall_trace_enter+0x154/0x1f0
? SyS_munmap+0x6e/0x90
do_syscall_64+0x6c/0x160
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x37d22e8ee7
RSP: 002b:00007ffedd1877b8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 00007ffedd1877c0 RCX: 00000037d22e8ee7
RDX: 00007ffedd1877af RSI: 0000000000000880 RDI: 00007ffedd1877c0
RBP: 00007ffedd187810 R08: 00007f0120b48700 R09: 0000000000000100
R10: 0000000000000011 R11: 0000000000000206 R12: 0000000000000880
R13: 00007ffedd188735 R14: 0000000000000000 R15: 0000000000000001 ---[
end trace 9bdbdddd5759d7e6 ]---
Steps to reproduce:
1. Bring up the iser target setup
2. Bring up the iser initiator setup
3. From DUT(initiator) login to all the Targets and start IOzone traffic on all the mounted luns.
4. Now unload iw_cxgb4 module on the iser initiator setup.
This is a generic issue, seen with other vendors also.
Could you give me a few pointers on how to debug it further to address this issue?
I am happy to provide any details further.
Thank you for any help you can provide, -Raju
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma"
in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html