Re: Need some pointers to debug a target hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+Cc Robert LeBlanc <robert@xxxxxxxxxxxxx> and full quote for context ]

On Mon, Oct 17, 2016 at 10:57:55PM -0700, Nicholas A. Bellinger wrote:
> Hello Johannes,
> 
> Apologies for the extended delayed follow-up on this bug report.
> 
> On Fri, 2016-09-02 at 16:14 +0200, Johannes Thumshirn wrote:
> > Hi Nick et al,
> > 
> > I'm having a "interesting" problem with the kernel's iSCSI target and
> > could use a debug hint.
> > 
> > My target uses an iblock backstore on a dm-linear target. When I now
> > get I/O form the initiator (I used a simple dd if=/dev/sda
> > of=/dev/null) and call 'dmsetup suspend $backstore' it'll take about
> > 15 seconds for the iscsi_ttx kernel thread to disapear, the iscsi_trx
> > and iscsi_np threads are hanging in 'D'.
> > 
> > From iscsi_trx's stack I see it's waiting in
> > __transport_wait_for_tasks(). The last thing I see in dmesg is the
> > 'ABORT_TASK: Found referenced %s task_tag: %llu' printk but the
> > 'ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: %llu" printk
> > is missing from core_tmr_abort_task(). As there's a
> > transport_wait_for_tasks() call in between I _think_ it is stuck in
> > aborting this one task and none of the
> > complete(se_cmd->t_transport_stop_comp) callers is called. What
> > puzzels me a bit is that right after transport_wait_for_tasks() in
> > core_tmr_abort_task() there's a call to transport_cmd_finish_abort()
> > which in turn calls transport_cmd_check_stop_to_fabric() ->
> > transport_cmd_check_stop() ->
> > complete_all(&cmd->t_transport_stop_comp).
> > 
> > Doing 
> > 
> > --- a/drivers/target/target_core_transport.c
> > +++ b/drivers/target/target_core_transport.c
> > @@ -2739,7 +2739,7 @@ __transport_wait_for_tasks(struct se_cmd
> >  
> >         spin_unlock_irqrestore(&cmd->t_state_lock, *flags);
> >  
> > -       wait_for_completion(&cmd->t_transport_stop_comp);
> > +       wait_for_completion_interruptible(&cmd->t_transport_stop_comp, 5 * HZ);
> >  
> >         spin_lock_irqsave(&cmd->t_state_lock, *flags);
> >         cmd->transport_state &= ~(CMD_T_ACTIVE | CMD_T_STOP);
> > 
> > "resolves" the bug, but I don't think this is correct.
> > 
> > This is all easily reproducible with v4.8-rc4 in qemu (for instance).
> > 
> > Any advice is aprechiated.
> > 
> 
> This is likely the missing SCF_ACK_KREF assignment in >= v4.1.y:
> 
> http://www.spinics.net/lists/target-devel/msg13530.html
> 
> At your earliest convenience, please verify using this patch for TMR
> ABORT_TASK due to target-core backend I/O still outstanding, with
> simultaneous failed iscsi session reinstatement -> repeated iscsi login
> timeout scenario.
> 
> Also once target-core backend I/O has (finally) been completed back to
> fabric driver code, the iscsi_np configfs group shutdown is allowed to
> proceed.
> 

Hi Nic, 

Thanks for the heads up, I'll give it a try.

Robert has sent a similar bug report on
http://www.spinics.net/lists/linux-rdma/msg41296.html so I CCed him as well.

Johannes
-- 
Johannes Thumshirn                                          Storage
jthumshirn@xxxxxxx                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux SCSI]     [Kernel Newbies]     [Linux SCSI Target Infrastructure]     [Share Photos]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Device Mapper]

  Powered by Linux