Re: Endless wait in transport_clear_lun_from_sessions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2011-09-16 at 21:39 +0200, Martin Svec wrote:
> Hello,
> 
> I tried my 2.6.39 LIO stress tests with the latest mainline 3.1-rc6+ code and
> with one of the tests, I sometimes hit an endless wait in transport_clear_lun_from_sessions()
> coupled with a wait in transport_clear_lun_thread().
> 
> The affected test repeatedly adds+enables and disables+removes a TPG while an initiator
> is connected to it and performs heavy sequential writes using fio. The problem occurs during
> TPG removal in unlink("/sys/kernel/config/target/iscsi/<target-iqn>/<tpgt#>/lun/<lun#>/<dev-symlink>").
> When entering the unlink(), TPG is already disabled by "echo 0 > .../enable" and all initiator
> ACLs are removed. It does not happen every time but I can almost always reproduce it within
> 10-20 minutes.
> 
> Tested on bleeding edge mainline 3.1-rc6+ with today's target patches for -rc7.
> 

Hi Martin,

Thanks for the detailed bug-report.  It sounds like you are pretty
certain this is regression with the v3.1 target-core+iscsi-target
mainline code, yes..?  Have you been able to verify this same test with
the pre v3.1 (eg: v4.0.0-rc7) code in lio-core-2.6.git..?

Nothing immediately rings a bell here, but i'll have a deeper look and
try reproducing over the weekend..  Also, it would be very helpful if
you could enable the dynamic_printk right before the shutdown sequence
with:

   echo 'module iscsi_target_mod +p' > /debug/dynamic_debug/control

and disable it before re-enabling the TPG to avoid all of the WRITE I/O
output noise with:

   echo 'module iscsi_target_mod -p' > /debug/dynamic_debug/control

This will require CONFIG_DYNAMIC_DEBUG=y in your kernel config, but
would be very helpful to diagnose the issue. 

--nab

> Call trace:
> 
> Sep 16 17:45:18 goose1 kernel: [ 8877.167113] INFO: task liomtpg:27686 blocked for more than 120 seconds.
> Sep 16 17:45:18 goose1 kernel: [ 8877.167117] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 16 17:45:18 goose1 kernel: [ 8877.167120] liomtpg         D ffff880321d68300     0 27686   2841 0x00000004
> Sep 16 17:45:18 goose1 kernel: [ 8877.167126]  ffff8803056fdc58 0000000000000082 ffff8803056fdb0a 0000001000000000
> Sep 16 17:45:18 goose1 kernel: [ 8877.167133]  ffff880321d68090 00000000000116c0 ffff8803056fdfd8 ffff8803056fc010
> Sep 16 17:45:18 goose1 kernel: [ 8877.167137]  ffff8803056fdfd8 00000000000116c0 ffff88032275f2d0 ffff880321d68090
> Sep 16 17:45:18 goose1 kernel: [ 8877.167143] Call Trace:
> Sep 16 17:45:18 goose1 kernel: [ 8877.167157]  [<ffffffff810394ca>] schedule+0x3a/0x60
> Sep 16 17:45:18 goose1 kernel: [ 8877.167165]  [<ffffffff81543475>] schedule_timeout+0x1b5/0x290
> Sep 16 17:45:18 goose1 kernel: [ 8877.167170]  [<ffffffff8103bd5c>] ? set_cpus_allowed_ptr+0x7c/0x110
> Sep 16 17:45:18 goose1 kernel: [ 8877.167176]  [<ffffffff815427b2>] wait_for_common+0xc2/0x160
> Sep 16 17:45:18 goose1 kernel: [ 8877.167180]  [<ffffffff8103bb40>] ? try_to_wake_up+0x290/0x290
> Sep 16 17:45:18 goose1 kernel: [ 8877.167183]  [<ffffffff815428f8>] wait_for_completion+0x18/0x20
> Sep 16 17:45:18 goose1 kernel: [ 8877.167206]  [<ffffffffa00148d1>] transport_clear_lun_from_sessions+0x51/0x80 [target_core_mod]
> Sep 16 17:45:18 goose1 kernel: [ 8877.167219]  [<ffffffffa00123ea>] core_tpg_post_dellun+0x2a/0x60 [target_core_mod]
> Sep 16 17:45:18 goose1 kernel: [ 8877.167229]  [<ffffffffa000602c>] core_dev_del_lun+0x2c/0x70 [target_core_mod]
> Sep 16 17:45:18 goose1 kernel: [ 8877.167235]  [<ffffffffa0008a7e>] target_fabric_port_unlink+0x4e/0x60 [target_core_mod]
> Sep 16 17:45:18 goose1 kernel: [ 8877.167241]  [<ffffffff8115925a>] configfs_unlink+0xfa/0x1c0
> Sep 16 17:45:18 goose1 kernel: [ 8877.167245]  [<ffffffff810fc75e>] vfs_unlink+0x8e/0x100
> Sep 16 17:45:18 goose1 kernel: [ 8877.167248]  [<ffffffff810ffca3>] do_unlinkat+0x1a3/0x1e0
> Sep 16 17:45:18 goose1 kernel: [ 8877.167251]  [<ffffffff810f18ad>] ? vfs_write+0x12d/0x180
> Sep 16 17:45:18 goose1 kernel: [ 8877.167253]  [<ffffffff810f19ec>] ? sys_write+0x4c/0x90
> Sep 16 17:45:18 goose1 kernel: [ 8877.167255]  [<ffffffff810ffcf1>] sys_unlink+0x11/0x20
> Sep 16 17:45:18 goose1 kernel: [ 8877.167260]  [<ffffffff815454fb>] system_call_fastpath+0x16/0x1b
> Sep 16 17:45:18 goose1 kernel: [ 8877.167262] INFO: task tcm_cl_9:27714 blocked for more than 120 seconds.
> Sep 16 17:45:18 goose1 kernel: [ 8877.167263] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Sep 16 17:45:18 goose1 kernel: [ 8877.167265] tcm_cl_9        D ffff8803224a62c0     0 27714      2 0x00000000
> Sep 16 17:45:18 goose1 kernel: [ 8877.167267]  ffff880314c7dd60 0000000000000046 0000000000000000 0000000000000000
> Sep 16 17:45:18 goose1 kernel: [ 8877.167270]  ffff8803224a6050 00000000000116c0 ffff880314c7dfd8 ffff880314c7c010
> Sep 16 17:45:18 goose1 kernel: [ 8877.167273]  ffff880314c7dfd8 00000000000116c0 ffff8803227a9350 ffff8803224a6050
> Sep 16 17:45:18 goose1 kernel: [ 8877.167276] Call Trace:
> Sep 16 17:45:18 goose1 kernel: [ 8877.167278]  [<ffffffff810394ca>] schedule+0x3a/0x60
> Sep 16 17:45:18 goose1 kernel: [ 8877.167280]  [<ffffffff81543475>] schedule_timeout+0x1b5/0x290
> Sep 16 17:45:18 goose1 kernel: [ 8877.167282]  [<ffffffff8103ba90>] ? try_to_wake_up+0x1e0/0x290
> Sep 16 17:45:18 goose1 kernel: [ 8877.167284]  [<ffffffff815427b2>] wait_for_common+0xc2/0x160
> Sep 16 17:45:18 goose1 kernel: [ 8877.167286]  [<ffffffff8103bb40>] ? try_to_wake_up+0x290/0x290
> Sep 16 17:45:18 goose1 kernel: [ 8877.167289]  [<ffffffff815428f8>] wait_for_completion+0x18/0x20
> Sep 16 17:45:18 goose1 kernel: [ 8877.167295]  [<ffffffffa0016665>] transport_clear_lun_thread+0x295/0x340 [target_core_mod]
> Sep 16 17:45:18 goose1 kernel: [ 8877.167302]  [<ffffffffa00163d0>] ? transport_stop_tasks_for_cmd+0x170/0x170 [target_core_mod]
> Sep 16 17:45:18 goose1 kernel: [ 8877.167308]  [<ffffffff8105e476>] kthread+0x96/0xa0
> Sep 16 17:45:18 goose1 kernel: [ 8877.167313]  [<ffffffff815472f4>] kernel_thread_helper+0x4/0x10
> Sep 16 17:45:18 goose1 kernel: [ 8877.167315]  [<ffffffff8105e3e0>] ? kthread_worker_fn+0x130/0x130
> Sep 16 17:45:18 goose1 kernel: [ 8877.167317]  [<ffffffff815472f0>] ? gs_change+0xb/0xb
> 
> 
> The process that calls unlink() blocks on lun->lun_shutdown_comp:
> 
> (gdb) list *(transport_clear_lun_from_sessions+0x51)
> 0x148d1 is in transport_clear_lun_from_sessions (drivers/target/target_core_transport.c:4608).
> 4603                            "tcm_cl_%u", lun->unpacked_lun);
> 4604            if (IS_ERR(kt)) {
> 4605                    pr_err("Unable to start clear_lun thread\n");
> 4606                    return PTR_ERR(kt);
> 4607            }
> 4608            wait_for_completion(&lun->lun_shutdown_comp);
> 4609
> 4610            return 0;
> 4611    }
> 4612
> 
> 
> tcm_cl thread blocks on cmd->transport_lun_stop_comp:
> 
> (gdb) list *(transport_clear_lun_thread+0x295)
> 0x16665 is in transport_clear_lun_thread (drivers/target/target_core_transport.c:4479).
> 4474                            " %d\n", cmd, cmd->t_task_list_num, ret);
> 4475            if (!ret) {
> 4476                    pr_debug("ConfigFS: ITT[0x%08x] - stopping cmd....\n",
> 4477                                    cmd->se_tfo->get_task_tag(cmd));
> 4478                    wait_for_completion(&cmd->transport_lun_stop_comp);
> 4479                    pr_debug("ConfigFS: ITT[0x%08x] - stopped cmd....\n",
> 4480                                    cmd->se_tfo->get_task_tag(cmd));
> 4481            }
> 4482            transport_remove_cmd_from_queue(cmd,&cmd->se_dev->dev_queue_obj);
> 4483
> 
> 
> Though I use my own library to manage configfs objects, the sequence of changes in configfs should
> be the same as in lio-utils. Below is my log of configfs operations that forego the bug (last line
> corresponds to the frozen unlink):
> 
> Sep 16 18:54:48 goose1 LIO: 8969 Disabling TPG iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting TPG network portal /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/np/192.168.224.10:3260
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL mapping /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.2005-09.com.zoner.goosem:openiscsi-initiator/lun_9/lio_lun
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.2005-09.com.zoner.goosem:openiscsi-initiator/lun_9
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator ACL dir /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.2005-09.com.zoner.goosem:openiscsi-initiator
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL mapping /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx9/lun_9/lio_lun
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx9/lun_9
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator ACL dir /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx9
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL mapping /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx8/lun_9/lio_lun
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx8/lun_9
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator ACL dir /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx8
> Sep 16 18:54:48 goose1 LIO: 8969 Deleting TPG LUN link /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/lun/lun_9/cpg0_lun9-lun9
> 
> 
> Martin
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux