Re: kernel BUG at drivers/scsi/qla2xxx/qla_target.c:3319!

"Madhani, Himanshu" <Himanshu.Madhani@xxxxxxxxxx> · Wed, 5 Jul 2017 17:52:01 +0000

Nic, 

> On Jun 30, 2017, at 12:22 AM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote:
> 
> Hi Pascal & Co,
> 
> On Mon, 2017-06-26 at 15:20 +0200, Pascal de Bruijn wrote:
> 
> <SNIP>
> 
>> In the meanwhile I think we now know what's causing the circumstances 
>> that lets this issue surface.
>> 
>> We can reliably reproduce the issue when we run 'iozone -a -I' in a VM 
>> that's placed on the LIO provided storage and kick off a md check:
>> 
>> echo check > /sys/block/md0/md/sync_action
>> 
>> This is also probably why we were experiencing these issues consistently 
>> on the night between Saturday and Sunday, as an md check would be kicked 
>> off from cron.
>> 
>> Since we were experimenting with md tuning udev rules, particularly our 
>> sync_speed_min change may have exacerbated things:
>> 
>> SUBSYSTEM=="block", KERNEL=="md*", ACTION=="change", 
>> TEST=="md/sync_speed_min", ATTR{md/sync_speed_min}="100000"
>> SUBSYSTEM=="block", KERNEL=="md*", ACTION=="change", 
>> TEST=="md/sync_speed_max", ATTR{md/sync_speed_max}="1000000"
>> 
> 
> So it looks like the backend storage device provided to LIO is reaching
> sufficiently high enough latency to cause ESX ABORT_TASK timeouts to
> fire..
> 
>>> After staring at this, there are two scenarios during ABORT_TASK where
>>> this scenario could occur.
>>> 
>>> Namely, there are two cases where tcm_qla2xxx.c calls
>>> tcm_qla2xxx_free_cmd() internally during ABORT_TASK:
>>> 
>>>   - During tcm_qla2xxx_handle_data_work() when cmd->aborted is detected.
>>>   - During tcm_qla2xxx_aborted_task() when QLA_TGT_STATE_NEW or
>>>     QLA_TGT_STATE_DATA_IN are set.
>>> 
>>> My guess is the latter case is happening, that is resulting in
>>> tcm_qla2xxx_free_cmd() queuing up qla_tgt_cmd->work from
>>> transport_cmd_finish_abort() -> tcm_qla2xxx_aborted_task(), immediately
>>> before target_put_sess_cmd() is called from core_tmr_abort_task() to
>>> perform the final kref_put(&cmd->kref, ...).
>>> 
> 
> <SNIP>
> 
>>> The second is the following patch to catch this scenario in progress
>>> during qlt_free_cmd(), and clear ->cmd_in-wq + dump the state before
>>> doing a cancel_work_sync().
>>> 
>>> diff --git a/drivers/scsi/qla2xxx/qla_target.h b/drivers/scsi/qla2xxx/qla_target.h
>>> index d644202..65752af 100644
>>> --- a/drivers/scsi/qla2xxx/qla_target.h
>>> +++ b/drivers/scsi/qla2xxx/qla_target.h
>>> @@ -876,7 +876,7 @@ struct qla_tgt_cmd {
>>>  	unsigned int q_full:1;
>>>  	unsigned int term_exchg:1;
>>>  	unsigned int cmd_sent_to_fw:1;
>>> -	unsigned int cmd_in_wq:1;
>>> +	unsigned int cmd_in_wq;
>>>  	unsigned int aborted:1;
>>>  	unsigned int data_work:1;
>>>  	unsigned int data_work_free:1;
>>> diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
>>> index 75aeb9f..3ceb371 100644
>>> --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
>>> +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
>>> @@ -336,6 +336,17 @@ static void tcm_qla2xxx_release_cmd(struct se_cmd *se_cmd)
>>>  	}
>>> 
>>>  	cmd = container_of(se_cmd, struct qla_tgt_cmd, se_cmd);
>>> +
>>> +	if (cmpxchg(&cmd->cmd_in_wq, 1, 0) == 1) {
>>> +		printk("Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1"
>>> +			" state: %d data_work: %d data_work_free: %d trc_flags: 0x%08x"
>>> +			" se_cmd->se_cmd_flags: 0x%08x se_cmd->transport_state: 0x%08x\n"
>>> +			" se_cmd->t_task_cdb: 0x%02x\n", cmd->state, cmd->data_work,
>>> +			cmd->data_work_free, cmd->trc_flags, cmd->se_cmd.se_cmd_flags,
>>> +			cmd->se_cmd.transport_state, cmd->se_cmd.t_task_cdb[0]);
>>> +		cancel_work_sync(&cmd->work);
>>> +	}
>>> +
>>>  	qlt_free_cmd(cmd);
>>>  }
>> 
>> This patch does apply cleanly to v4.9.34, however it did not compile, as 
>> data_work, data_work_free and trc_flags are not yet present. After 
>> removing those references it did compile fine.
>> 
>> Here's the relevant part of dmesg:
>> 
>> [  103.740391] md: data-check of RAID array md0
>> [  103.740395] md: minimum _guaranteed_  speed: 100000 KB/sec/disk.
>> [  103.740397] md: using maximum available idle IO bandwidth (but not more than 1000000 KB/sec) for data-check.
>> [  103.740416] md: using 128k window, over a total of 405798912k.
>> [  435.673247] ABORT_TASK: Found referenced qla2xxx task_tag: 1222180
>> [  435.927330] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091 se_cmd->t_task_cdb: 0x2a
>> [  435.927351] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1222180
>> [  565.785280] qla2xxx/21:00:00:24:ff:08:c4:8a: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.
>> [  702.818402] ABORT_TASK: Found referenced qla2xxx task_tag: 1230964
>> [  703.090441] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091 se_cmd->t_task_cdb: 0x2a
>> [  703.090457] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1230964
>> [  888.912452] ABORT_TASK: Found referenced qla2xxx task_tag: 1215316
>> [  889.133709] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091 se_cmd->t_task_cdb: 0x2a
>> [  889.133713] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1215316
>> [ 1741.356248] ABORT_TASK: Found referenced qla2xxx task_tag: 1222900
>> [ 1741.609044] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091 se_cmd->t_task_cdb: 0x2a
>> [ 1741.609047] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1222900
>> [ 2365.885579] qla2xxx/21:00:00:24:ff:01:fa:9d: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION.
>> [ 2425.729887] ABORT_TASK: Found referenced qla2xxx task_tag: 1242772
>> [ 2425.901292] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091se_cmd->t_task_cdb: 0x2a
>> [ 2425.901296] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 
>> 1242772
>> [ 2778.797321] md: md0: data-check done.
>> [ 4165.989290] qla2xxx/21:00:00:24:ff:01:fa:9d: Unsupported SCSI Opcode 
>> 0x85, sending CHECK_CONDITION.
>> 
>> Presumably every "Cancelling..." would have been a lockup?
> 
> Fantastic, this is exactly the info I was looking for.  :)
> 
> So decoding the 'Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1' bits:
> 
> *) qla_tgt_cmd->state: 2
> 
> qla_tgt_cmd->state = 2 is QLA_TGT_STATE_DATA_IN, which means FCP
> data-out was successfully received, and the WRITE CDB has been submitted
> into LIO backend storage.
> 
> *) se_cmd->se_cmd_flags: 0x01c00109
> 
> This bitmask means the following flags are set:
> 
>   SCF_SUPPORTED_SAM_OPCODE
>   SCF_SCSI_DATA_CDB
>   SCF_SE_LUN_CMD
>   SCF_ACK_KREF
>   SCF_USE_CPUID
>   SCF_TASK_ATTR_SET
> 
> Nothing out of the ordinarily here..
> 
> *) se_cmd->transport_state: 0x00000091
> 
>  CMD_T_ABORTED
>  CMD_T_SENT;
>  CMD_T_DEV_ACTIVE;
> 
> This means the WRITE command sent to the backend device, but did not
> complete before it was explicitly aborted by the ESX host.
> 
> Based on what's in the logs above, this also looks as expected.
> 
> *) se_cmd->t_task_cdb: 0x2a
> 
> So this command is a WRITE_10 = 0x2a.
> 
> This would indicate WRITE latency for the LIO backend storage is getting
> sufficiently high to trigger ESX host side ABORT_TASK timeouts, which
> are much more aggressive than normal Linux host I/O timeouts.
> 
> So separate from the qla2xxx BUG_ON here discussed below, the high
> backend device latency is worth looking into, as usually once ESX host
> starts triggering ABORT_TASK timeouts it ends up taking the datastore
> offline.
> 
> That said, I'm certain you're hitting the bogus tcm_qla2xxx_free_cmd()
> in tcm_qla2xxx_aborted_task(), which based on the initial analysis is
> definitely the wrong thing to do.
> 
> So here's a patch against v4.12-rc that addresses the scenario you where
> able to trigger, and fixes the second scenario as well.
> 
> As you noted above, it will take a little massaging to apply to v4.9.x
> due to qla_tgt_cmd->cmd->cmd_flags becoming qla_tgt_cmd->data_work and
> ->data_work_free.  But that shouldn't be too hard.  :)
> 
> So please test without the initial debug patch on your v4.9.y based
> setup.
> 
> Himanshu + Quinn, please review as after further groking of what Pascal
> has observed, these are two obvious bugs.
> 
> From 4db6a8145940d0bbd10265020d681961ce2d3238 Mon Sep 17 00:00:00 2001
> From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> Date: Fri, 30 Jun 2017 00:08:13 -0700
> Subject: [PATCH] qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR
> ABORT
> 
> This patch drops two incorrect usages of tcm_qla2xxx_free_cmd()
> during TMR ABORT within tcm_qla2xxx_handle_data_work() and
> tcm_qla2xxx_aborted_task(), which where attempting to dispatch
> into workqueue context to do tcm_qla2xxx_complete_free() and
> subsequently invoke transport_generic_free_cmd().
> 
> This is incorrect because during TMR ABORT target-core will
> drop the outstanding se_cmd->cmd_kref references once it has
> quiesced the se_cmd via transport_wait_for_tasks(), and in
> the case of qla2xxx it should not attempt to do it's own
> transport_generic_free_cmd().
> 
> As reported by Pascal, this was originally manifesting as a
> BUG_ON(cmd->cmd_in_wq) in qlt_free_cmd() during TMR ABORT
> with a LIO backend with sufficently high WRITE latency to
> trigger a host side TMR ABORT_TASK to occur.
> 
> In addition, for the case in tcm_qla2xxx_write_pending_status()
> and tcm_qla2xxx_handle_data_work() that waits for outstanding
> FCP WRITE data transfer to complete before preceedin with a
> TMR ABORT, avoid using se_cmd->t_transport_stop_comp that is
> already used by transport_wait_for_tasks() and use a qla2xxx
> internal struct completion instead.
> 
> Reported-by: Pascal de Bruijn <p.debruijn@xxxxxxxxxxx>
> Cc: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>
> Cc: Quinn Tran <quinn.tran@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # 3.10+
> Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx>
> ---
> drivers/scsi/qla2xxx/qla_target.c  |  4 ++++
> drivers/scsi/qla2xxx/qla_target.h  |  1 +
> drivers/scsi/qla2xxx/tcm_qla2xxx.c | 36 +++---------------------------------
> 3 files changed, 8 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c
> index 401e245..8f8ece9 100644
> --- a/drivers/scsi/qla2xxx/qla_target.c
> +++ b/drivers/scsi/qla2xxx/qla_target.c
> @@ -4079,6 +4079,8 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha,
> 	cmd = &((struct qla_tgt_cmd *)se_sess->sess_cmd_map)[tag];
> 	memset(cmd, 0, sizeof(struct qla_tgt_cmd));
> 
> +	init_completion(&cmd->write_pending_abort_comp);
> +
> 	memcpy(&cmd->atio, atio, sizeof(*atio));
> 	cmd->state = QLA_TGT_STATE_NEW;
> 	cmd->tgt = vha->vha_tgt.qla_tgt;
> @@ -5083,6 +5085,8 @@ static int __qlt_send_busy(struct scsi_qla_host *vha,
> 
> 	qlt_incr_num_pend_cmds(vha);
> 	INIT_LIST_HEAD(&cmd->cmd_list);
> +	init_completion(&cmd->write_pending_abort_comp);
> +
> 	memcpy(&cmd->atio, atio, sizeof(*atio));
> 
> 	cmd->tgt = vha->vha_tgt.qla_tgt;
> diff --git a/drivers/scsi/qla2xxx/qla_target.h b/drivers/scsi/qla2xxx/qla_target.h
> index d644202..939e93c5 100644
> --- a/drivers/scsi/qla2xxx/qla_target.h
> +++ b/drivers/scsi/qla2xxx/qla_target.h
> @@ -868,6 +868,7 @@ struct qla_tgt_cmd {
> 	unsigned char sense_buffer[TRANSPORT_SENSE_BUFFER];
> 
> 	spinlock_t cmd_lock;
> +	struct completion write_pending_abort_comp;
> 	/* to save extra sess dereferences */
> 	unsigned int conf_compl_supported:1;
> 	unsigned int sg_mapped:1;
> diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> index 75aeb9f..8c1bf9b 100644
> --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c
> @@ -415,6 +415,7 @@ static int tcm_qla2xxx_write_pending(struct se_cmd *se_cmd)
> 
> static int tcm_qla2xxx_write_pending_status(struct se_cmd *se_cmd)
> {
> +	struct qla_tgt_cmd *cmd = container_of(se_cmd, struct qla_tgt_cmd, se_cmd);
> 	unsigned long flags;
> 	/*
> 	 * Check for WRITE_PENDING status to determine if we need to wait for
> @@ -424,8 +425,7 @@ static int tcm_qla2xxx_write_pending_status(struct se_cmd *se_cmd)
> 	if (se_cmd->t_state == TRANSPORT_WRITE_PENDING ||
> 	    se_cmd->t_state == TRANSPORT_COMPLETE_QF_WP) {
> 		spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
> -		wait_for_completion_timeout(&se_cmd->t_transport_stop_comp,
> -						50);
> +		wait_for_completion(&cmd->write_pending_abort_comp);
> 		return 0;
> 	}
> 	spin_unlock_irqrestore(&se_cmd->t_state_lock, flags);
> @@ -501,7 +501,6 @@ static int tcm_qla2xxx_handle_cmd(scsi_qla_host_t *vha, struct qla_tgt_cmd *cmd,
> static void tcm_qla2xxx_handle_data_work(struct work_struct *work)
> {
> 	struct qla_tgt_cmd *cmd = container_of(work, struct qla_tgt_cmd, work);
> -	unsigned long flags;
> 
> 	/*
> 	 * Ensure that the complete FCP WRITE payload has been received.
> @@ -509,17 +508,6 @@ static void tcm_qla2xxx_handle_data_work(struct work_struct *work)
> 	 */
> 	cmd->cmd_in_wq = 0;
> 
> -	spin_lock_irqsave(&cmd->cmd_lock, flags);
> -	cmd->data_work = 1;
> -	if (cmd->aborted) {
> -		cmd->data_work_free = 1;
> -		spin_unlock_irqrestore(&cmd->cmd_lock, flags);
> -
> -		tcm_qla2xxx_free_cmd(cmd);
> -		return;
> -	}
> -	spin_unlock_irqrestore(&cmd->cmd_lock, flags);
> -
> 	cmd->vha->tgt_counters.qla_core_ret_ctio++;
> 	if (!cmd->write_data_transferred) {
> 		/*
> @@ -527,7 +515,7 @@ static void tcm_qla2xxx_handle_data_work(struct work_struct *work)
> 		 * waiting upon completion in tcm_qla2xxx_write_pending_status()
> 		 */
> 		if (cmd->se_cmd.transport_state & CMD_T_ABORTED) {
> -			complete(&cmd->se_cmd.t_transport_stop_comp);
> +			complete(&cmd->write_pending_abort_comp);
> 			return;
> 		}
> 
> @@ -753,31 +741,13 @@ static void tcm_qla2xxx_queue_tm_rsp(struct se_cmd *se_cmd)
> 	qlt_xmit_tm_rsp(mcmd);
> }
> 
> -#define DATA_WORK_NOT_FREE(_cmd) (_cmd->data_work && !_cmd->data_work_free)
> static void tcm_qla2xxx_aborted_task(struct se_cmd *se_cmd)
> {
> 	struct qla_tgt_cmd *cmd = container_of(se_cmd,
> 				struct qla_tgt_cmd, se_cmd);
> -	unsigned long flags;
> 
> 	if (qlt_abort_cmd(cmd))
> 		return;
> -
> -	spin_lock_irqsave(&cmd->cmd_lock, flags);
> -	if ((cmd->state == QLA_TGT_STATE_NEW)||
> -	    ((cmd->state == QLA_TGT_STATE_DATA_IN) &&
> -		DATA_WORK_NOT_FREE(cmd))) {
> -		cmd->data_work_free = 1;
> -		spin_unlock_irqrestore(&cmd->cmd_lock, flags);
> -		/*
> -		 * cmd has not reached fw, Use this trigger to free it.
> -		 */
> -		tcm_qla2xxx_free_cmd(cmd);
> -		return;
> -	}
> -	spin_unlock_irqrestore(&cmd->cmd_lock, flags);
> -	return;
> -
> }
> 
> static void tcm_qla2xxx_clear_sess_lookup(struct tcm_qla2xxx_lport *,
> 

Patch looks good. 

Acked-by: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx>

Thanks,
- Himanshu

--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html