Hello, my iscsi stress tests triggered a bug during target shutdown in latest stable kernels (3.7.1 and earlier). Turning off an iscsi target with active connections during heavy random write i/o sometimes leads to kernel oopses on different places, hung kernel tasks and other symptoms of memory corruption. This calltrace best shows what happens: BUG: unable to handle kernel NULL pointer dereference at (null) Entering kdb (current=0xffff88022693d9d0, pid 15) on processor 1 Oops: (null) due to oops @ 0xffffffff812d6bb6 Pid: 15, comm: kworker/1:0 Not tainted 3.7.1-00004-ga57fbc7-dirty #23 Dell Inc. PowerEdge R510/0DPRKF RIP: 0010:[<ffffffff812d6bb6>] [<ffffffff812d6bb6>] __list_add+0x26/0xd0 Call Trace: [<ffffffffa00952ba>] iscsit_add_cmd_to_response_queue+0x9a/0x100 [iscsi_target_mod] [<ffffffffa00a20f6>] lio_queue_status+0x26/0x30 [iscsi_target_mod] [<ffffffffa003d818>] target_complete_ok_work+0x198/0x200 [target_core_mod] [<ffffffff81056913>] process_one_work+0x133/0x410 [<ffffffffa003d680>] ? target_qf_do_work+0x2a0/0x2a0 [target_core_mod] [<ffffffff81057800>] worker_thread+0x160/0x3c0 [<ffffffff810576a0>] ? manage_workers+0x280/0x280 [<ffffffff8105cfc6>] kthread+0xc6/0xd0 [<ffffffff8105cf00>] ? flush_kthread_worker+0xb0/0xb0 [<ffffffff8164c02c>] ret_from_fork+0x7c/0xb0 [<ffffffff8105cf00>] ? flush_kthread_worker+0xb0/0xb0 I put few printks to the code and found that iscsit_add_cmd_to_response_queue was called _after_ iscsit_close_connection. So the iscsi termination logic seems to be broken in the way that iscsi connection is closed and kfree'd while there are still pending iscsi_cmds somewhere in the target referencing the deallocated connection. Below is a dmesg snippet where iscsi_conn with address ffff880224924800 is kfree'd (time 432.338664) but after successful termination of the target, two outstanding cmds use this connection address in iscsit_add_cmd_to_response_queue (time 432.479821). Dec 22 21:35:16 goose2 kernel: [ 432.338662] Moving to TARG_CONN_STATE_FREE. Dec 22 21:35:16 goose2 kernel: [ 432.338664] MARTIN: freeing connection ffff880224924800. Dec 22 21:35:16 goose2 kernel: [ 432.338665] Decremented iSCSI connection count to 0 from node: iqn.1998-01.com.vmware:hv-esx8 Dec 22 21:35:16 goose2 kernel: [ 432.338666] Moving to TARG_SESS_STATE_FAILED. Dec 22 21:35:16 goose2 kernel: [ 432.338671] TARGET_CORE[iSCSI]: Deregistered fabric_sess Dec 22 21:35:16 goose2 kernel: [ 432.338673] Moving to TARG_SESS_STATE_FREE. Dec 22 21:35:16 goose2 kernel: [ 432.338674] Released iSCSI session from node: iqn.1998-01.com.vmware:hv-esx8 Dec 22 21:35:16 goose2 kernel: [ 432.338675] Decremented number of active iSCSI Sessions on iSCSI TPG: 1 to 0 Dec 22 21:35:16 goose2 kernel: [ 432.338677] Released 1 iSCSI Session(s) from Target Portal Group: 1 Dec 22 21:35:16 goose2 kernel: [ 432.338678] iSCSI_TPG[1] - Disabled iSCSI Target Portal Group Dec 22 21:35:16 goose2 kernel: [ 432.338717] LIO_Target_ConfigFS: DEREGISTER -> iqn.2011-07.com.zoner:inetstore.vmw0.san2 TPGT: 1 PORTAL: 192.168.224.10:3260 Dec 22 21:35:16 goose2 kernel: [ 432.338720] CORE[iqn.2011-07.com.zoner:inetstore.vmw0.san2] - Removed Network Portal: 192.168.224.10:3260,1 on TCP Dec 22 21:35:16 goose2 kernel: [ 432.338788] CORE[0] - Removed Network Portal: 192.168.224.10:3260 on TCP Dec 22 21:35:16 goose2 kernel: [ 432.338789] LIO_Target_ConfigFS: delnpfromtpg done! Dec 22 21:35:16 goose2 kernel: [ 432.338838] iSCSI_TPG[1]_LUN[2] - Removed ACL for InitiatorNode: iqn.1998-01.com.vmware:hv-esx9 Mapped LUN: 2 Dec 22 21:35:16 goose2 kernel: [ 432.338857] iSCSI_TPG[1] - Freeing ACL for iSCSI InitiatorNode: iqn.1998-01.com.vmware:hv-esx9 Mapped LUN: 2 Dec 22 21:35:16 goose2 kernel: [ 432.338877] iSCSI_TPG[1]_LUN[1] - Removed ACL for InitiatorNode: iqn.1998-01.com.vmware:hv-esx9 Mapped LUN: 1 Dec 22 21:35:16 goose2 kernel: [ 432.338894] iSCSI_TPG[1] - Freeing ACL for iSCSI InitiatorNode: iqn.1998-01.com.vmware:hv-esx9 Mapped LUN: 1 Dec 22 21:35:16 goose2 kernel: [ 432.338933] iSCSI_TPG[1] - Deleted ACL with TCQ Depth: 64 for iSCSI Initiator Node: iqn.1998-01.com.vmware:hv-esx9 Dec 22 21:35:16 goose2 kernel: [ 432.338965] iSCSI_TPG[1]_LUN[2] - Removed ACL for InitiatorNode: iqn.1998-01.com.vmware:hv-esx8 Mapped LUN: 2 Dec 22 21:35:16 goose2 kernel: [ 432.338983] iSCSI_TPG[1] - Freeing ACL for iSCSI InitiatorNode: iqn.1998-01.com.vmware:hv-esx8 Mapped LUN: 2 Dec 22 21:35:16 goose2 kernel: [ 432.339003] iSCSI_TPG[1]_LUN[1] - Removed ACL for InitiatorNode: iqn.1998-01.com.vmware:hv-esx8 Mapped LUN: 1 Dec 22 21:35:16 goose2 kernel: [ 432.339024] iSCSI_TPG[1] - Freeing ACL for iSCSI InitiatorNode: iqn.1998-01.com.vmware:hv-esx8 Mapped LUN: 1 Dec 22 21:35:16 goose2 kernel: [ 432.339063] iSCSI_TPG[1] - Deleted ACL with TCQ Depth: 64 for iSCSI Initiator Node: iqn.1998-01.com.vmware:hv-esx8 Dec 22 21:35:16 goose2 kernel: [ 432.339184] iSCSI_TPG[1]_LUN[2] - Deactivated iSCSI Logical Unit from device object Dec 22 21:35:16 goose2 kernel: [ 432.339313] iSCSI_TPG[1]_LUN[1] - Deactivated iSCSI Logical Unit from device object Dec 22 21:35:16 goose2 kernel: [ 432.339353] LIO_Target_ConfigFS: DEREGISTER -> Releasing TPG Dec 22 21:35:16 goose2 kernel: [ 432.339355] Released 0 iSCSI Session(s) from Target Portal Group: 1 Dec 22 21:35:16 goose2 kernel: [ 432.339366] TARGET_CORE[Normal]: Deallocating iSCSI struct se_portal_group for endpoint: iqn.2011-07.com.zoner:inetstore.vmw0.san2 Portal Tag 1 Dec 22 21:35:16 goose2 kernel: [ 432.339398] CORE[iqn.2011-07.com.zoner:inetstore.vmw0.san2]_TPG[1] - Deleted iSCSI Target Portal Group ... Dec 22 21:35:16 goose2 lrmd: [3117]: info: operation stop[153] on tpgvmw0 for client 3120: pid 8900 exited with return code 0 ... Dec 22 21:35:17 goose2 kernel: [ 432.479815] Incremented dev->dev_cur_ordered_id: 42204 for SIMPLE: 42204 Dec 22 21:35:17 goose2 kernel: [ 432.479821] MARTIN: Adding ffff880210641a40 cmd to conn ffff880224924800 as qreq ffff88020fca4330 Dec 22 21:35:17 goose2 kernel: [ 432.483560] Incremented dev->dev_cur_ordered_id: 42205 for SIMPLE: 42241 Dec 22 21:35:17 goose2 kernel: [ 432.483564] MARTIN: Adding ffff88020e39eac0 cmd to conn ffff880224924800 as qreq ffff88020fca4230 ... Dec 22 21:35:17 goose2 kernel: [ 432.510296] LIO_Target_ConfigFS: DEREGISTER -> iqn.2011-07.com.zoner:inetstore.vmw0.san2 Dec 22 21:35:17 goose2 kernel: [ 432.510302] CORE[0] - Deleted iSCSI Target IQN: iqn.2011-07.com.zoner:inetstore.vmw0.san2 Dec 22 21:35:17 goose2 kernel: [ 432.510338] LIO_Target_ConfigFS: DEREGISTER -> iqn.2011-07.com.zoner:inetstore.vmw0.san1 Dec 22 21:35:17 goose2 kernel: [ 432.510340] CORE[0] - Deleted iSCSI Target IQN: iqn.2011-07.com.zoner:inetstore.vmw0.san1 Dec 22 21:35:17 goose2 kernel: [ 432.510485] Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr: ffff880211081c00 Dec 22 21:35:17 goose2 kernel: [ 432.510527] Target_Core_ConfigFS: Deallocating se_subsystem_dev_t: ffff8802102d1000 Dec 22 21:35:17 goose2 kernel: [ 432.510604] Target_Core_ConfigFS: Calling se_free_virtual_device() for se_dev_ptr: ffff88020fc71000 Dec 22 21:35:17 goose2 kernel: [ 432.510666] Target_Core_ConfigFS: Deallocating se_subsystem_dev_t: ffff8802103c3000 Dec 22 21:35:17 goose2 kernel: [ 432.510690] CORE_HBA[12] - Detached HBA from Generic Target Core Dec 22 21:35:17 goose2 kernel: [ 432.510708] CORE_HBA[11] - Detached HBA from Generic Target Core Dec 22 21:35:17 goose2 kernel: [ 432.510729] CORE_HBA[10] - Detached HBA from Generic Target Core ... Dec 22 21:35:44 goose2 kernel: [ 460.265788] BUG: soft lockup - CPU#4 stuck for 22s! [prmount:8994] Tested with 3.7.1 stable kernel, 3.6.x looks to have the same problem. I skipped v3.4 and v3.5 so I don't know if the bug was introduced in v3.6 or earlier. Martin -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html