Hello, I tried my 2.6.39 LIO stress tests with the latest mainline 3.1-rc6+ code and with one of the tests, I sometimes hit an endless wait in transport_clear_lun_from_sessions() coupled with a wait in transport_clear_lun_thread(). The affected test repeatedly adds+enables and disables+removes a TPG while an initiator is connected to it and performs heavy sequential writes using fio. The problem occurs during TPG removal in unlink("/sys/kernel/config/target/iscsi/<target-iqn>/<tpgt#>/lun/<lun#>/<dev-symlink>"). When entering the unlink(), TPG is already disabled by "echo 0 > .../enable" and all initiator ACLs are removed. It does not happen every time but I can almost always reproduce it within 10-20 minutes. Tested on bleeding edge mainline 3.1-rc6+ with today's target patches for -rc7. Call trace: Sep 16 17:45:18 goose1 kernel: [ 8877.167113] INFO: task liomtpg:27686 blocked for more than 120 seconds. Sep 16 17:45:18 goose1 kernel: [ 8877.167117] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 16 17:45:18 goose1 kernel: [ 8877.167120] liomtpg D ffff880321d68300 0 27686 2841 0x00000004 Sep 16 17:45:18 goose1 kernel: [ 8877.167126] ffff8803056fdc58 0000000000000082 ffff8803056fdb0a 0000001000000000 Sep 16 17:45:18 goose1 kernel: [ 8877.167133] ffff880321d68090 00000000000116c0 ffff8803056fdfd8 ffff8803056fc010 Sep 16 17:45:18 goose1 kernel: [ 8877.167137] ffff8803056fdfd8 00000000000116c0 ffff88032275f2d0 ffff880321d68090 Sep 16 17:45:18 goose1 kernel: [ 8877.167143] Call Trace: Sep 16 17:45:18 goose1 kernel: [ 8877.167157] [<ffffffff810394ca>] schedule+0x3a/0x60 Sep 16 17:45:18 goose1 kernel: [ 8877.167165] [<ffffffff81543475>] schedule_timeout+0x1b5/0x290 Sep 16 17:45:18 goose1 kernel: [ 8877.167170] [<ffffffff8103bd5c>] ? set_cpus_allowed_ptr+0x7c/0x110 Sep 16 17:45:18 goose1 kernel: [ 8877.167176] [<ffffffff815427b2>] wait_for_common+0xc2/0x160 Sep 16 17:45:18 goose1 kernel: [ 8877.167180] [<ffffffff8103bb40>] ? try_to_wake_up+0x290/0x290 Sep 16 17:45:18 goose1 kernel: [ 8877.167183] [<ffffffff815428f8>] wait_for_completion+0x18/0x20 Sep 16 17:45:18 goose1 kernel: [ 8877.167206] [<ffffffffa00148d1>] transport_clear_lun_from_sessions+0x51/0x80 [target_core_mod] Sep 16 17:45:18 goose1 kernel: [ 8877.167219] [<ffffffffa00123ea>] core_tpg_post_dellun+0x2a/0x60 [target_core_mod] Sep 16 17:45:18 goose1 kernel: [ 8877.167229] [<ffffffffa000602c>] core_dev_del_lun+0x2c/0x70 [target_core_mod] Sep 16 17:45:18 goose1 kernel: [ 8877.167235] [<ffffffffa0008a7e>] target_fabric_port_unlink+0x4e/0x60 [target_core_mod] Sep 16 17:45:18 goose1 kernel: [ 8877.167241] [<ffffffff8115925a>] configfs_unlink+0xfa/0x1c0 Sep 16 17:45:18 goose1 kernel: [ 8877.167245] [<ffffffff810fc75e>] vfs_unlink+0x8e/0x100 Sep 16 17:45:18 goose1 kernel: [ 8877.167248] [<ffffffff810ffca3>] do_unlinkat+0x1a3/0x1e0 Sep 16 17:45:18 goose1 kernel: [ 8877.167251] [<ffffffff810f18ad>] ? vfs_write+0x12d/0x180 Sep 16 17:45:18 goose1 kernel: [ 8877.167253] [<ffffffff810f19ec>] ? sys_write+0x4c/0x90 Sep 16 17:45:18 goose1 kernel: [ 8877.167255] [<ffffffff810ffcf1>] sys_unlink+0x11/0x20 Sep 16 17:45:18 goose1 kernel: [ 8877.167260] [<ffffffff815454fb>] system_call_fastpath+0x16/0x1b Sep 16 17:45:18 goose1 kernel: [ 8877.167262] INFO: task tcm_cl_9:27714 blocked for more than 120 seconds. Sep 16 17:45:18 goose1 kernel: [ 8877.167263] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. Sep 16 17:45:18 goose1 kernel: [ 8877.167265] tcm_cl_9 D ffff8803224a62c0 0 27714 2 0x00000000 Sep 16 17:45:18 goose1 kernel: [ 8877.167267] ffff880314c7dd60 0000000000000046 0000000000000000 0000000000000000 Sep 16 17:45:18 goose1 kernel: [ 8877.167270] ffff8803224a6050 00000000000116c0 ffff880314c7dfd8 ffff880314c7c010 Sep 16 17:45:18 goose1 kernel: [ 8877.167273] ffff880314c7dfd8 00000000000116c0 ffff8803227a9350 ffff8803224a6050 Sep 16 17:45:18 goose1 kernel: [ 8877.167276] Call Trace: Sep 16 17:45:18 goose1 kernel: [ 8877.167278] [<ffffffff810394ca>] schedule+0x3a/0x60 Sep 16 17:45:18 goose1 kernel: [ 8877.167280] [<ffffffff81543475>] schedule_timeout+0x1b5/0x290 Sep 16 17:45:18 goose1 kernel: [ 8877.167282] [<ffffffff8103ba90>] ? try_to_wake_up+0x1e0/0x290 Sep 16 17:45:18 goose1 kernel: [ 8877.167284] [<ffffffff815427b2>] wait_for_common+0xc2/0x160 Sep 16 17:45:18 goose1 kernel: [ 8877.167286] [<ffffffff8103bb40>] ? try_to_wake_up+0x290/0x290 Sep 16 17:45:18 goose1 kernel: [ 8877.167289] [<ffffffff815428f8>] wait_for_completion+0x18/0x20 Sep 16 17:45:18 goose1 kernel: [ 8877.167295] [<ffffffffa0016665>] transport_clear_lun_thread+0x295/0x340 [target_core_mod] Sep 16 17:45:18 goose1 kernel: [ 8877.167302] [<ffffffffa00163d0>] ? transport_stop_tasks_for_cmd+0x170/0x170 [target_core_mod] Sep 16 17:45:18 goose1 kernel: [ 8877.167308] [<ffffffff8105e476>] kthread+0x96/0xa0 Sep 16 17:45:18 goose1 kernel: [ 8877.167313] [<ffffffff815472f4>] kernel_thread_helper+0x4/0x10 Sep 16 17:45:18 goose1 kernel: [ 8877.167315] [<ffffffff8105e3e0>] ? kthread_worker_fn+0x130/0x130 Sep 16 17:45:18 goose1 kernel: [ 8877.167317] [<ffffffff815472f0>] ? gs_change+0xb/0xb The process that calls unlink() blocks on lun->lun_shutdown_comp: (gdb) list *(transport_clear_lun_from_sessions+0x51) 0x148d1 is in transport_clear_lun_from_sessions (drivers/target/target_core_transport.c:4608). 4603 "tcm_cl_%u", lun->unpacked_lun); 4604 if (IS_ERR(kt)) { 4605 pr_err("Unable to start clear_lun thread\n"); 4606 return PTR_ERR(kt); 4607 } 4608 wait_for_completion(&lun->lun_shutdown_comp); 4609 4610 return 0; 4611 } 4612 tcm_cl thread blocks on cmd->transport_lun_stop_comp: (gdb) list *(transport_clear_lun_thread+0x295) 0x16665 is in transport_clear_lun_thread (drivers/target/target_core_transport.c:4479). 4474 " %d\n", cmd, cmd->t_task_list_num, ret); 4475 if (!ret) { 4476 pr_debug("ConfigFS: ITT[0x%08x] - stopping cmd....\n", 4477 cmd->se_tfo->get_task_tag(cmd)); 4478 wait_for_completion(&cmd->transport_lun_stop_comp); 4479 pr_debug("ConfigFS: ITT[0x%08x] - stopped cmd....\n", 4480 cmd->se_tfo->get_task_tag(cmd)); 4481 } 4482 transport_remove_cmd_from_queue(cmd,&cmd->se_dev->dev_queue_obj); 4483 Though I use my own library to manage configfs objects, the sequence of changes in configfs should be the same as in lio-utils. Below is my log of configfs operations that forego the bug (last line corresponds to the frozen unlink): Sep 16 18:54:48 goose1 LIO: 8969 Disabling TPG iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1 Sep 16 18:54:48 goose1 LIO: 8969 Deleting TPG network portal /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/np/192.168.224.10:3260 Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL mapping /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.2005-09.com.zoner.goosem:openiscsi-initiator/lun_9/lio_lun Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.2005-09.com.zoner.goosem:openiscsi-initiator/lun_9 Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator ACL dir /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.2005-09.com.zoner.goosem:openiscsi-initiator Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL mapping /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx9/lun_9/lio_lun Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx9/lun_9 Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator ACL dir /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx9 Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL mapping /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx8/lun_9/lio_lun Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator LUN ACL /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx8/lun_9 Sep 16 18:54:48 goose1 LIO: 8969 Deleting Initiator ACL dir /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/acls/iqn.1998-01.com.vmware:hv-esx8 Sep 16 18:54:48 goose1 LIO: 8969 Deleting TPG LUN link /sys/kernel/config/target/iscsi/iqn.2011-07.com.zoner:inetstore.vmw0.san2/tpgt_1/lun/lun_9/cpg0_lun9-lun9 Martin -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html