[+Cc Robert LeBlanc <robert@xxxxxxxxxxxxx> and full quote for context ] On Mon, Oct 17, 2016 at 10:57:55PM -0700, Nicholas A. Bellinger wrote: > Hello Johannes, > > Apologies for the extended delayed follow-up on this bug report. > > On Fri, 2016-09-02 at 16:14 +0200, Johannes Thumshirn wrote: > > Hi Nick et al, > > > > I'm having a "interesting" problem with the kernel's iSCSI target and > > could use a debug hint. > > > > My target uses an iblock backstore on a dm-linear target. When I now > > get I/O form the initiator (I used a simple dd if=/dev/sda > > of=/dev/null) and call 'dmsetup suspend $backstore' it'll take about > > 15 seconds for the iscsi_ttx kernel thread to disapear, the iscsi_trx > > and iscsi_np threads are hanging in 'D'. > > > > From iscsi_trx's stack I see it's waiting in > > __transport_wait_for_tasks(). The last thing I see in dmesg is the > > 'ABORT_TASK: Found referenced %s task_tag: %llu' printk but the > > 'ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: %llu" printk > > is missing from core_tmr_abort_task(). As there's a > > transport_wait_for_tasks() call in between I _think_ it is stuck in > > aborting this one task and none of the > > complete(se_cmd->t_transport_stop_comp) callers is called. What > > puzzels me a bit is that right after transport_wait_for_tasks() in > > core_tmr_abort_task() there's a call to transport_cmd_finish_abort() > > which in turn calls transport_cmd_check_stop_to_fabric() -> > > transport_cmd_check_stop() -> > > complete_all(&cmd->t_transport_stop_comp). > > > > Doing > > > > --- a/drivers/target/target_core_transport.c > > +++ b/drivers/target/target_core_transport.c > > @@ -2739,7 +2739,7 @@ __transport_wait_for_tasks(struct se_cmd > > > > spin_unlock_irqrestore(&cmd->t_state_lock, *flags); > > > > - wait_for_completion(&cmd->t_transport_stop_comp); > > + wait_for_completion_interruptible(&cmd->t_transport_stop_comp, 5 * HZ); > > > > spin_lock_irqsave(&cmd->t_state_lock, *flags); > > cmd->transport_state &= ~(CMD_T_ACTIVE | CMD_T_STOP); > > > > "resolves" the bug, but I don't think this is correct. > > > > This is all easily reproducible with v4.8-rc4 in qemu (for instance). > > > > Any advice is aprechiated. > > > > This is likely the missing SCF_ACK_KREF assignment in >= v4.1.y: > > http://www.spinics.net/lists/target-devel/msg13530.html > > At your earliest convenience, please verify using this patch for TMR > ABORT_TASK due to target-core backend I/O still outstanding, with > simultaneous failed iscsi session reinstatement -> repeated iscsi login > timeout scenario. > > Also once target-core backend I/O has (finally) been completed back to > fabric driver code, the iscsi_np configfs group shutdown is allowed to > proceed. > Hi Nic, Thanks for the heads up, I'll give it a try. Robert has sent a similar bug report on http://www.spinics.net/lists/linux-rdma/msg41296.html so I CCed him as well. Johannes -- Johannes Thumshirn Storage jthumshirn@xxxxxxx +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html