Re: Endless wait in transport_clear_lun_from_sessions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2011-09-18 at 11:50 +0200, Martin Svec wrote:
> Hi Nick,
> 
> Below is the requested dynamic_debug output. I highlighted
> the lines you wanted to convert to printk by "[TCM]" prefix.
> 
> Few notes:
> 
> I'm not sure if it is a 3.1 regression because I've made a number
> of changes to my SAN setup since 2.6.39 tests. Now, I have two
> targets (san1 and san2) that provide multipathed access to the
> same LUN under two different IQNs and two different NPs.
> I put a mark in the output to show where the successful san1
> TPG cleanup ends and failed san2 TPG removal starts.
> 
> Also, I simulate loss of cabling by setting a firewall rule "DROP all
> INPUT on port 3260" immediately before the TPG removal sequence.
> Don't know if it can have something to do with the bug but I mention
> it for completeness.
> 
> Now, I want to repeat the test with only one TPG and also get kdb
> backtrace of rx/tx threads. Side question -- is there a way to play
> with kdb remotely? I don't have physical access to the servers
> today.
> 

This is still not making sense to me for one important reason:

See below..

> Martin
> 
> 

<SNIP>

> ---- end of san1 cleanup ---
> 
> Sep 17 22:34:31 goose2 kernel: [11252.240833] Moving to TARG_CONN_STATE_CLEANUP_WAIT.
> Sep 17 22:34:31 goose2 kernel: [11252.240836] Performing cleanup for failed iSCSI Connection ID: 0 from iqn.1998-01.com.vmware:hv-esx8
> Sep 17 22:34:31 goose2 kernel: [11252.240840] Closing iSCSI connection CID 0 on SID: 1
> Sep 17 22:34:31 goose2 kernel: [11252.240847] rx_loop: -512 total_rx: 0
> Sep 17 22:34:31 goose2 kernel: [11252.240895] Moving to TARG_CONN_STATE_FREE.
> Sep 17 22:34:31 goose2 kernel: [11252.240898] Decremented iSCSI connection count to 0 from node: iqn.1998-01.com.vmware:hv-esx8
> Sep 17 22:34:31 goose2 kernel: [11252.240900] Moving to TARG_SESS_STATE_FAILED.
> Sep 17 22:34:31 goose2 kernel: [11252.240934] Moving to TARG_SESS_STATE_FREE.
> Sep 17 22:34:31 goose2 kernel: [11252.240937] Released iSCSI session from node: iqn.1998-01.com.vmware:hv-esx8
> Sep 17 22:34:31 goose2 kernel: [11252.240939] Decremented number of active iSCSI Sessions on iSCSI TPG: 1 to 1
> Sep 17 22:34:31 goose2 kernel: [11252.265021] Moving to TARG_CONN_STATE_CLEANUP_WAIT.
> Sep 17 22:34:31 goose2 kernel: [11252.265028] Performing cleanup for failed iSCSI Connection ID: 0 from iqn.1998-01.com.vmware:hv-esx9
> Sep 17 22:34:31 goose2 kernel: [11252.265032] Closing iSCSI connection CID 0 on SID: 2
> Sep 17 22:34:31 goose2 kernel: [11252.265044] rx_loop: -512 total_rx: 0
> Sep 17 22:34:31 goose2 kernel: [11252.265079] Updated MaxCmdSN to 0x00e75fc3
> Sep 17 22:34:31 goose2 kernel: [11252.265083] [TCM] TGWFT: wait_for_tasks: Stopping ffff8800b8af8d00 ITT: 0x1f5de700 i_state: 1, t_state/def_t_state: 5/0, t_transport_stop = TRUE
> Sep 17 22:34:32 goose2 kernel: [11252.501136] [TCM] TGWFT: wait_for_tasks: Stopped wait_for_compltion(&cmd->t_transport_stop_comp) for ITT: 0x1f5de700
> Sep 17 22:34:32 goose2 kernel: [11252.501149] Updated MaxCmdSN to 0x00e75fc4
> Sep 17 22:34:32 goose2 kernel: [11252.501152] [TCM] TGWFT: wait_for_tasks: Stopping ffff8800b7dfed00 ITT: 0x4c5fe700 i_state: 6, t_state/def_t_state: 5/0, t_transport_stop = TRUE
> Sep 17 22:34:32 goose2 kernel: [11252.506286] [TCM] TGWFT: wait_for_tasks: Stopped wait_for_compltion(&cmd->t_transport_stop_comp) for ITT: 0x4c5fe700
> Sep 17 22:34:32 goose2 kernel: [11252.506766] Moving to TARG_CONN_STATE_FREE.
> Sep 17 22:34:32 goose2 kernel: [11252.506770] Decremented iSCSI connection count to 0 from node: iqn.1998-01.com.vmware:hv-esx9
> Sep 17 22:34:32 goose2 kernel: [11252.506773] Moving to TARG_SESS_STATE_FAILED.
> Sep 17 22:34:32 goose2 kernel: [11252.506784] Moving to TARG_SESS_STATE_FREE.
> Sep 17 22:34:32 goose2 kernel: [11252.506788] Released iSCSI session from node: iqn.1998-01.com.vmware:hv-esx9
> Sep 17 22:34:32 goose2 kernel: [11252.506791] Decremented number of active iSCSI Sessions on iSCSI TPG: 1 to 0
> Sep 17 22:34:32 goose2 kernel: [11252.506796] Released 2 iSCSI Session(s) from Target Portal Group: 1
> Sep 17 22:34:32 goose2 kernel: [11252.506799] iSCSI_TPG[1] - Disabled iSCSI Target Portal Group

At this point, the two active sessions from vmware:hv-esx8 and
vmware:hv-esx9 have been successfully released w/ all active I/O
properly shutdown and accounted for..  Eg: everything is functioning as
expected, and the number of active sessions has been decremented to '0'.

> Sep 17 22:34:32 goose2 kernel: [11252.506878] LIO_Target_ConfigFS: DEREGISTER -> iqn.2011-07.com.zoner:inetstore.vmw0.san2 TPGT: 1 PORTAL: 192.168.224.10:3260
> Sep 17 22:34:32 goose2 kernel: [11252.506883] CORE[iqn.2011-07.com.zoner:inetstore.vmw0.san2] - Removed Network Portal: 192.168.224.10:3260,1 on TCP
> Sep 17 22:34:32 goose2 kernel: [11252.506921] CORE[0] - Removed Network Portal: 192.168.224.10:3260 on TCP
> Sep 17 22:34:32 goose2 kernel: [11252.506923] LIO_Target_ConfigFS: delnpfromtpg done!
> Sep 17 22:34:32 goose2 kernel: [11252.507388] [TCM] TCLFS: SE_LUN[9] - Setting cmd->transport_lun_stop for  ITT: 0xc95fe700
> Sep 17 22:34:32 goose2 kernel: [11252.507392] [TCM] TCLFS: SE_LUN[9] - ITT: 0xc95fe700 before transport_lun_wait_for_tasks()
> Sep 17 22:34:32 goose2 kernel: [11252.507395] [TCM] TSTFC: ITT[0xc95fe700] - Stopping tasks
> Sep 17 22:34:32 goose2 kernel: [11252.507398] [TCM] TLWFT: ConfigFS: cmd: ffff8800b6eef7c0 t_tasks: 0 stop tasks ret: 0
> Sep 17 22:34:32 goose2 kernel: [11252.507401] [TCM] TLWFT: ConfigFS: ITT[0xc95fe700] - stopping cmd....

This is the really strange part..  So here we are removing the TPG LUN
after all of the sessions have been shutdown, but there is still somehow
a se_cmd descriptor in se_lun->lun_cmd_list (the offending se_cmd
descriptor is ITT: 0xc95fe700, cmd ffff8800b6eef7c0)

So looking at target_core_mod, I still don't see how releasing a se_cmd
via the normal paths could ever avoid also removing itself from
se_lun->lun_cmd_list.  So at least for now pretty sure it's not in
target_core_mod..

Looking at the incoming iscsi_target.c:iscsit_handle_scsi_cmd() path, I
don't see any failures that could cause this scenario either..  Can you
try reproducing again with the following to add a bit more context for
the offending se_cmd descriptor in transport_lun_wait_for_tasks()..?

Thanks,

--nab

diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c
index 315790e..4f31e3d 100644
--- a/drivers/target/target_core_transport.c
+++ b/drivers/target/target_core_transport.c
@@ -4477,8 +4477,11 @@ static int transport_lun_wait_for_tasks(struct se_cmd *cmd, struct se_lun *lun)
        pr_debug("ConfigFS: cmd: %p t_tasks: %d stop tasks ret:"
                        " %d\n", cmd, cmd->t_task_list_num, ret);
        if (!ret) {
-               pr_debug("ConfigFS: ITT[0x%08x] - stopping cmd....\n",
-                               cmd->se_tfo->get_task_tag(cmd));
+               pr_debug("ConfigFS: ITT[0x%08x] - stopping cmd.... fabric state: %d,"
+                               " t_state: %d, cmd->deferred_t_state: %d\n",
+                               cmd->se_tfo->get_task_tag(cmd),
+                               cmd->se_tfo->get_cmd_state(cmd), cmd->t_state,
+                               cmd->deferred_t_state);
                wait_for_completion(&cmd->transport_lun_stop_comp);
                pr_debug("ConfigFS: ITT[0x%08x] - stopped cmd....\n",
                                cmd->se_tfo->get_task_tag(cmd));



> Sep 17 22:37:00 goose2 kernel: [11400.770031] INFO: task liomtpg:17065 blocked for more than 120 seconds.
> Sep 17 22:37:00 goose2 kernel: [11400.770034] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 17 22:37:00 goose2 kernel: [11400.770037] liomtpg         D ffff88012ac2e500     0 17065   2651 0x00000004
> Sep 17 22:37:00 goose2 kernel: [11400.770042]  ffff8800b723bc48 0000000000000086 ffff8800b723bb0a 0000001000000000
> Sep 17 22:37:00 goose2 kernel: [11400.770047]  ffff88012ac2e290 00000000000116c0 ffff8800b723bfd8 ffff8800b723a010
> Sep 17 22:37:00 goose2 kernel: [11400.770052]  ffff8800b723bfd8 00000000000116c0 ffff88012b385290 ffff88012ac2e290
> Sep 17 22:37:00 goose2 kernel: [11400.770057] Call Trace:
> Sep 17 22:37:00 goose2 kernel: [11400.770066]  [<ffffffff8103968a>] schedule+0x3a/0x60
> Sep 17 22:37:00 goose2 kernel: [11400.770071]  [<ffffffff815587e5>] schedule_timeout+0x1b5/0x290
> Sep 17 22:37:00 goose2 kernel: [11400.770075]  [<ffffffff8103bf1c>] ? set_cpus_allowed_ptr+0x7c/0x110
> Sep 17 22:37:00 goose2 kernel: [11400.770081]  [<ffffffff81557b22>] wait_for_common+0xc2/0x160
> Sep 17 22:37:00 goose2 kernel: [11400.770084]  [<ffffffff8103bd00>] ? try_to_wake_up+0x290/0x290
> Sep 17 22:37:00 goose2 kernel: [11400.770087]  [<ffffffff81557c68>] wait_for_completion+0x18/0x20
> Sep 17 22:37:00 goose2 kernel: [11400.770107]  [<ffffffffa003d1d1>] transport_clear_lun_from_sessions+0x51/0x80 [target_core_mod]
> Sep 17 22:37:00 goose2 kernel: [11400.770118]  [<ffffffffa003a93a>] core_tpg_post_dellun+0x2a/0x60 [target_core_mod]
> Sep 17 22:37:00 goose2 kernel: [11400.770128]  [<ffffffffa002ca35>] core_dev_del_lun+0x35/0xc0 [target_core_mod]
> Sep 17 22:37:00 goose2 kernel: [11400.770132]  [<ffffffff811133fe>] ? simple_unlink+0x4e/0x60
> Sep 17 22:37:00 goose2 kernel: [11400.770142]  [<ffffffffa002fa3e>] target_fabric_port_unlink+0x4e/0x60 [target_core_mod]
> Sep 17 22:37:00 goose2 kernel: [11400.770147]  [<ffffffff8115a24a>] configfs_unlink+0xfa/0x1c0
> Sep 17 22:37:00 goose2 kernel: [11400.770151]  [<ffffffff810fcbee>] vfs_unlink+0x8e/0x100
> Sep 17 22:37:00 goose2 kernel: [11400.770154]  [<ffffffff81100133>] do_unlinkat+0x1a3/0x1e0
> Sep 17 22:37:00 goose2 kernel: [11400.770159]  [<ffffffff810f1d3d>] ? vfs_write+0x12d/0x180
> Sep 17 22:37:00 goose2 kernel: [11400.770162]  [<ffffffff810f1e7c>] ? sys_write+0x4c/0x90
> Sep 17 22:37:00 goose2 kernel: [11400.770165]  [<ffffffff81100181>] sys_unlink+0x11/0x20
> Sep 17 22:37:00 goose2 kernel: [11400.770169]  [<ffffffff8155a83b>] system_call_fastpath+0x16/0x1b
> Sep 17 22:37:00 goose2 kernel: [11400.770172] INFO: task tcm_cl_9:17086 blocked for more than 120 seconds.
> Sep 17 22:37:00 goose2 kernel: [11400.770175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 17 22:37:00 goose2 kernel: [11400.770177] tcm_cl_9        D ffff88012ac92380     0 17086      2 0x00000000
> Sep 17 22:37:00 goose2 kernel: [11400.770181]  ffff880106bffd60 0000000000000046 0000000000000002 06bffdc000000000
> Sep 17 22:37:00 goose2 kernel: [11400.770186]  ffff88012ac92110 00000000000116c0 ffff880106bfffd8 ffff880106bfe010
> Sep 17 22:37:00 goose2 kernel: [11400.770190]  ffff880106bfffd8 00000000000116c0 ffff88012b33d210 ffff88012ac92110
> Sep 17 22:37:00 goose2 kernel: [11400.770195] Call Trace:
> Sep 17 22:37:00 goose2 kernel: [11400.770198]  [<ffffffff8103968a>] schedule+0x3a/0x60
> Sep 17 22:37:00 goose2 kernel: [11400.770201]  [<ffffffff815587e5>] schedule_timeout+0x1b5/0x290
> Sep 17 22:37:00 goose2 kernel: [11400.770205]  [<ffffffff8103f470>] ? console_trylock+0x10/0x50
> Sep 17 22:37:00 goose2 kernel: [11400.770208]  [<ffffffff81557b22>] wait_for_common+0xc2/0x160
> Sep 17 22:37:00 goose2 kernel: [11400.770212]  [<ffffffff8103bd00>] ? try_to_wake_up+0x290/0x290
> Sep 17 22:37:00 goose2 kernel: [11400.770215]  [<ffffffff81557c68>] wait_for_completion+0x18/0x20
> Sep 17 22:37:00 goose2 kernel: [11400.770226]  [<ffffffffa003f3a6>] transport_clear_lun_thread+0x386/0x410 [target_core_mod]
> Sep 17 22:37:00 goose2 kernel: [11400.770237]  [<ffffffffa003f020>] ? transport_stop_tasks_for_cmd+0x1f0/0x1f0 [target_core_mod]
> Sep 17 22:37:00 goose2 kernel: [11400.770242]  [<ffffffff8105e636>] kthread+0x96/0xa0
> Sep 17 22:37:00 goose2 kernel: [11400.770246]  [<ffffffff8155c634>] kernel_thread_helper+0x4/0x10
> Sep 17 22:37:00 goose2 kernel: [11400.770250]  [<ffffffff8105e5a0>] ? kthread_worker_fn+0x130/0x130
> Sep 17 22:37:00 goose2 kernel: [11400.770253]  [<ffffffff8155c630>] ? gs_change+0xb/0xb
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux