Hi Tommy, On Sat, 2014-01-25 at 15:19 +0100, Tommy Apel wrote: > Hello, after disconnecting my srp initiator and trying to shut down > the target I end up with a hung/stale system, I have experienced this > on both 3.13.0 and 3.10.25 > > Here is the dmesg > > [170319.904119] Received DREQ and sent DREP for session 0x00000000000000000002c9030005566e. > [170321.960898] Received IB TimeWait exit for cm_id ffff88046c72da00. > [170321.960993] Session 0x00000000000000000002c9030005566e: kernel thread ib_srpt_compl (PID 9208) stopped > [170564.275488] INFO: task tcm_fabric:14473 blocked for more than 120 seconds. > [170564.275491] Not tainted 3.13.0-gentoo-r1 #1 > [170564.275492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [170564.275493] tcm_fabric D ffff88047fdd2d00 0 14473 14444 0x00000004 > [170564.275496] ffff880372806790 0000000000000002 ffff88046f2e9810 ffff880406b5449f > [170564.275499] 0000000000012d00 ffff880395b35fd8 0000000000012d00 ffff880372806790 > [170564.275501] ffff880406b54110 ffff880312b94420 ffff880312b94428 7fffffffffffffff > [170564.275503] Call Trace: > [170564.275510] [<ffffffff817335ba>] ? schedule_timeout+0x17a/0x1e0 > [170564.275514] [<ffffffff810923a3>] ? enqueue_task_fair+0x1b3/0xa90 > [170564.275516] [<ffffffff8173507d>] ? wait_for_completion+0x9d/0x110 > [170564.275519] [<ffffffff8108c790>] ? try_to_wake_up+0x280/0x280 > [170564.275527] [<ffffffffa01129f6>] ? transport_clear_lun_ref+0x46/0x70 [target_core_mod] > [170564.275532] [<ffffffffa010d687>] ? core_tpg_post_dellun+0x27/0x60 [target_core_mod] > [170564.275537] [<ffffffffa00ffc65>] ? core_dev_del_lun+0x35/0xb0 [target_core_mod] > [170564.275542] [<ffffffffa01018e3>] ? target_fabric_port_unlink+0x43/0x60 [target_core_mod] > [170564.275545] [<ffffffff811c870e>] ? configfs_unlink+0xee/0x1c0 > [170564.275549] [<ffffffff8116331a>] ? vfs_unlink+0xda/0x160 > [170564.275551] [<ffffffff811635ce>] ? do_unlinkat+0x22e/0x260 > [170564.275554] [<ffffffff810105d5>] ? syscall_trace_enter+0x115/0x1c0 > [170564.275557] [<ffffffff81737ae1>] ? tracesys+0xd4/0xd9 Thanks for reporting. So starting with v3.13 code, this particular logic has been changed to use percpu refcounting. I'm able to reproduce a similar issue with a different fabric driver, and currently in the process of tracking this bug down. AFAICT this was a v3.13 specific regression, but given your comment above it sounds like there is an issue on v3.10.y code (at least for SRP anyways). Btw, It would be helpful to see a dmesg log on v3.10.y code for this bug as well. Thanks, --nab -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html