Re: [PATCH 13/25] tcm_qla2xxx: Do not allow aborted cmd to advance.

Laurence Oberman <loberman@xxxxxxxxxx> · Sat, 31 Mar 2018 11:20:23 -0400



On Sun, 2017-05-21 at 21:45 -0700, Nicholas A. Bellinger wrote:
> On Fri, 2017-05-19 at 14:53 -0700, Himanshu Madhani wrote:
> > From: Quinn Tran <quinn.tran@xxxxxxxxxx>
> > 
> > In case of hardware queue full, commands can loop between
> > TCM stack and tcm_qla2xx shim layers for retry. While command
> > is waiting for retry, task mgmt can get ahead and abort the
> > cmmand that encountered queue full condition. Fix this by
> > dropping the command, if task mgmt has already started the
> > command free process.
> > 
> > Cc: <stable@xxxxxxxxxxxxxxx>
> > Signed-off-by: Quinn Tran <quinn.tran@xxxxxxxxxx>
> > Signed-off-by: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>
> > ---
> >  drivers/scsi/qla2xxx/tcm_qla2xxx.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> > b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> > index 7443e4efa3ae..07f8ad001bcb 100644
> > --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> > +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> > @@ -686,6 +686,20 @@ static int tcm_qla2xxx_queue_status(struct
> > se_cmd *se_cmd)
> >  				struct qla_tgt_cmd, se_cmd);
> >  	int xmit_type = QLA_TGT_XMIT_STATUS;
> >  
> > +	if (cmd->aborted) {
> > +		/*
> > +		 * Cmd can loop during Q-full.
> > tcm_qla2xxx_aborted_task
> > +		 * can get ahead of this cmd.
> > tcm_qla2xxx_aborted_task
> > +		 * already kick start the free.
> > +		 */
> > +		pr_debug(
> > +		    "queue_data_in aborted cmd[%p] refcount %d
> > transport_state %x, t_state %x, se_cmd_flags %x\n",
> > +		    cmd, kref_read(&cmd->se_cmd.cmd_kref),
> > +		    cmd->se_cmd.transport_state, cmd-
> > >se_cmd.t_state,
> > +		    cmd->se_cmd.se_cmd_flags);
> > +		return 0;
> > +	}
> > +
> >  	cmd->bufflen = se_cmd->data_length;
> >  	cmd->sg = NULL;
> >  	cmd->sg_cnt = 0;
> 
> As your comment above mentions, there is a known issue in target-core
> when queue-full occurs repeatably and a se_cmd descriptor is
> explicitly
> stopped via session shutdown, TMR ABORT_TASK / LUN_RESET or
> otherwise.
> 
> We hit this scenario a while back in iser-target with iw_cxgb hw,
> which
> due to a separate bug was constantly triggering queue-full under load
> to
> uncover this same case.
> 
> So I'm still considering the different approaches to address this in
> target-core proper, but don't have a problem with merging this as-is
> as
> it won't logically conflict with any of those changes.
> 
> That said:
> 
> Acked-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> 
Indeed the queue-full issue has been there for quite a while in the
core.
Affects ISER, SRP and F/C (tcm_qla2xxx)