Nic, > On Jun 30, 2017, at 12:22 AM, Nicholas A. Bellinger <nab@xxxxxxxxxxxxxxx> wrote: > > Hi Pascal & Co, > > On Mon, 2017-06-26 at 15:20 +0200, Pascal de Bruijn wrote: > > <SNIP> > >> In the meanwhile I think we now know what's causing the circumstances >> that lets this issue surface. >> >> We can reliably reproduce the issue when we run 'iozone -a -I' in a VM >> that's placed on the LIO provided storage and kick off a md check: >> >> echo check > /sys/block/md0/md/sync_action >> >> This is also probably why we were experiencing these issues consistently >> on the night between Saturday and Sunday, as an md check would be kicked >> off from cron. >> >> Since we were experimenting with md tuning udev rules, particularly our >> sync_speed_min change may have exacerbated things: >> >> SUBSYSTEM=="block", KERNEL=="md*", ACTION=="change", >> TEST=="md/sync_speed_min", ATTR{md/sync_speed_min}="100000" >> SUBSYSTEM=="block", KERNEL=="md*", ACTION=="change", >> TEST=="md/sync_speed_max", ATTR{md/sync_speed_max}="1000000" >> > > So it looks like the backend storage device provided to LIO is reaching > sufficiently high enough latency to cause ESX ABORT_TASK timeouts to > fire.. > >>> After staring at this, there are two scenarios during ABORT_TASK where >>> this scenario could occur. >>> >>> Namely, there are two cases where tcm_qla2xxx.c calls >>> tcm_qla2xxx_free_cmd() internally during ABORT_TASK: >>> >>> - During tcm_qla2xxx_handle_data_work() when cmd->aborted is detected. >>> - During tcm_qla2xxx_aborted_task() when QLA_TGT_STATE_NEW or >>> QLA_TGT_STATE_DATA_IN are set. >>> >>> My guess is the latter case is happening, that is resulting in >>> tcm_qla2xxx_free_cmd() queuing up qla_tgt_cmd->work from >>> transport_cmd_finish_abort() -> tcm_qla2xxx_aborted_task(), immediately >>> before target_put_sess_cmd() is called from core_tmr_abort_task() to >>> perform the final kref_put(&cmd->kref, ...). >>> > > <SNIP> > >>> The second is the following patch to catch this scenario in progress >>> during qlt_free_cmd(), and clear ->cmd_in-wq + dump the state before >>> doing a cancel_work_sync(). >>> >>> diff --git a/drivers/scsi/qla2xxx/qla_target.h b/drivers/scsi/qla2xxx/qla_target.h >>> index d644202..65752af 100644 >>> --- a/drivers/scsi/qla2xxx/qla_target.h >>> +++ b/drivers/scsi/qla2xxx/qla_target.h >>> @@ -876,7 +876,7 @@ struct qla_tgt_cmd { >>> unsigned int q_full:1; >>> unsigned int term_exchg:1; >>> unsigned int cmd_sent_to_fw:1; >>> - unsigned int cmd_in_wq:1; >>> + unsigned int cmd_in_wq; >>> unsigned int aborted:1; >>> unsigned int data_work:1; >>> unsigned int data_work_free:1; >>> diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c >>> index 75aeb9f..3ceb371 100644 >>> --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c >>> +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c >>> @@ -336,6 +336,17 @@ static void tcm_qla2xxx_release_cmd(struct se_cmd *se_cmd) >>> } >>> >>> cmd = container_of(se_cmd, struct qla_tgt_cmd, se_cmd); >>> + >>> + if (cmpxchg(&cmd->cmd_in_wq, 1, 0) == 1) { >>> + printk("Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1" >>> + " state: %d data_work: %d data_work_free: %d trc_flags: 0x%08x" >>> + " se_cmd->se_cmd_flags: 0x%08x se_cmd->transport_state: 0x%08x\n" >>> + " se_cmd->t_task_cdb: 0x%02x\n", cmd->state, cmd->data_work, >>> + cmd->data_work_free, cmd->trc_flags, cmd->se_cmd.se_cmd_flags, >>> + cmd->se_cmd.transport_state, cmd->se_cmd.t_task_cdb[0]); >>> + cancel_work_sync(&cmd->work); >>> + } >>> + >>> qlt_free_cmd(cmd); >>> } >> >> This patch does apply cleanly to v4.9.34, however it did not compile, as >> data_work, data_work_free and trc_flags are not yet present. After >> removing those references it did compile fine. >> >> Here's the relevant part of dmesg: >> >> [ 103.740391] md: data-check of RAID array md0 >> [ 103.740395] md: minimum _guaranteed_ speed: 100000 KB/sec/disk. >> [ 103.740397] md: using maximum available idle IO bandwidth (but not more than 1000000 KB/sec) for data-check. >> [ 103.740416] md: using 128k window, over a total of 405798912k. >> [ 435.673247] ABORT_TASK: Found referenced qla2xxx task_tag: 1222180 >> [ 435.927330] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091 se_cmd->t_task_cdb: 0x2a >> [ 435.927351] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1222180 >> [ 565.785280] qla2xxx/21:00:00:24:ff:08:c4:8a: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. >> [ 702.818402] ABORT_TASK: Found referenced qla2xxx task_tag: 1230964 >> [ 703.090441] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091 se_cmd->t_task_cdb: 0x2a >> [ 703.090457] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1230964 >> [ 888.912452] ABORT_TASK: Found referenced qla2xxx task_tag: 1215316 >> [ 889.133709] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091 se_cmd->t_task_cdb: 0x2a >> [ 889.133713] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1215316 >> [ 1741.356248] ABORT_TASK: Found referenced qla2xxx task_tag: 1222900 >> [ 1741.609044] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091 se_cmd->t_task_cdb: 0x2a >> [ 1741.609047] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 1222900 >> [ 2365.885579] qla2xxx/21:00:00:24:ff:01:fa:9d: Unsupported SCSI Opcode 0x85, sending CHECK_CONDITION. >> [ 2425.729887] ABORT_TASK: Found referenced qla2xxx task_tag: 1242772 >> [ 2425.901292] Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1 state: 2 se_cmd->se_cmd_flags: 0x01c00109 se_cmd->transport_state: 0x00000091se_cmd->t_task_cdb: 0x2a >> [ 2425.901296] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: >> 1242772 >> [ 2778.797321] md: md0: data-check done. >> [ 4165.989290] qla2xxx/21:00:00:24:ff:01:fa:9d: Unsupported SCSI Opcode >> 0x85, sending CHECK_CONDITION. >> >> Presumably every "Cancelling..." would have been a lockup? > > Fantastic, this is exactly the info I was looking for. :) > > So decoding the 'Canceling outstanding qla_tgt_cmd->cmd_in_wq = 1' bits: > > *) qla_tgt_cmd->state: 2 > > qla_tgt_cmd->state = 2 is QLA_TGT_STATE_DATA_IN, which means FCP > data-out was successfully received, and the WRITE CDB has been submitted > into LIO backend storage. > > *) se_cmd->se_cmd_flags: 0x01c00109 > > This bitmask means the following flags are set: > > SCF_SUPPORTED_SAM_OPCODE > SCF_SCSI_DATA_CDB > SCF_SE_LUN_CMD > SCF_ACK_KREF > SCF_USE_CPUID > SCF_TASK_ATTR_SET > > Nothing out of the ordinarily here.. > > *) se_cmd->transport_state: 0x00000091 > > CMD_T_ABORTED > CMD_T_SENT; > CMD_T_DEV_ACTIVE; > > This means the WRITE command sent to the backend device, but did not > complete before it was explicitly aborted by the ESX host. > > Based on what's in the logs above, this also looks as expected. > > *) se_cmd->t_task_cdb: 0x2a > > So this command is a WRITE_10 = 0x2a. > > This would indicate WRITE latency for the LIO backend storage is getting > sufficiently high to trigger ESX host side ABORT_TASK timeouts, which > are much more aggressive than normal Linux host I/O timeouts. > > So separate from the qla2xxx BUG_ON here discussed below, the high > backend device latency is worth looking into, as usually once ESX host > starts triggering ABORT_TASK timeouts it ends up taking the datastore > offline. > > That said, I'm certain you're hitting the bogus tcm_qla2xxx_free_cmd() > in tcm_qla2xxx_aborted_task(), which based on the initial analysis is > definitely the wrong thing to do. > > So here's a patch against v4.12-rc that addresses the scenario you where > able to trigger, and fixes the second scenario as well. > > As you noted above, it will take a little massaging to apply to v4.9.x > due to qla_tgt_cmd->cmd->cmd_flags becoming qla_tgt_cmd->data_work and > ->data_work_free. But that shouldn't be too hard. :) > > So please test without the initial debug patch on your v4.9.y based > setup. > > Himanshu + Quinn, please review as after further groking of what Pascal > has observed, these are two obvious bugs. > > From 4db6a8145940d0bbd10265020d681961ce2d3238 Mon Sep 17 00:00:00 2001 > From: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> > Date: Fri, 30 Jun 2017 00:08:13 -0700 > Subject: [PATCH] qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR > ABORT > > This patch drops two incorrect usages of tcm_qla2xxx_free_cmd() > during TMR ABORT within tcm_qla2xxx_handle_data_work() and > tcm_qla2xxx_aborted_task(), which where attempting to dispatch > into workqueue context to do tcm_qla2xxx_complete_free() and > subsequently invoke transport_generic_free_cmd(). > > This is incorrect because during TMR ABORT target-core will > drop the outstanding se_cmd->cmd_kref references once it has > quiesced the se_cmd via transport_wait_for_tasks(), and in > the case of qla2xxx it should not attempt to do it's own > transport_generic_free_cmd(). > > As reported by Pascal, this was originally manifesting as a > BUG_ON(cmd->cmd_in_wq) in qlt_free_cmd() during TMR ABORT > with a LIO backend with sufficently high WRITE latency to > trigger a host side TMR ABORT_TASK to occur. > > In addition, for the case in tcm_qla2xxx_write_pending_status() > and tcm_qla2xxx_handle_data_work() that waits for outstanding > FCP WRITE data transfer to complete before preceedin with a > TMR ABORT, avoid using se_cmd->t_transport_stop_comp that is > already used by transport_wait_for_tasks() and use a qla2xxx > internal struct completion instead. > > Reported-by: Pascal de Bruijn <p.debruijn@xxxxxxxxxxx> > Cc: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx> > Cc: Quinn Tran <quinn.tran@xxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> # 3.10+ > Signed-off-by: Nicholas Bellinger <nab@xxxxxxxxxxxxxxx> > --- > drivers/scsi/qla2xxx/qla_target.c | 4 ++++ > drivers/scsi/qla2xxx/qla_target.h | 1 + > drivers/scsi/qla2xxx/tcm_qla2xxx.c | 36 +++--------------------------------- > 3 files changed, 8 insertions(+), 33 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_target.c b/drivers/scsi/qla2xxx/qla_target.c > index 401e245..8f8ece9 100644 > --- a/drivers/scsi/qla2xxx/qla_target.c > +++ b/drivers/scsi/qla2xxx/qla_target.c > @@ -4079,6 +4079,8 @@ static struct qla_tgt_cmd *qlt_get_tag(scsi_qla_host_t *vha, > cmd = &((struct qla_tgt_cmd *)se_sess->sess_cmd_map)[tag]; > memset(cmd, 0, sizeof(struct qla_tgt_cmd)); > > + init_completion(&cmd->write_pending_abort_comp); > + > memcpy(&cmd->atio, atio, sizeof(*atio)); > cmd->state = QLA_TGT_STATE_NEW; > cmd->tgt = vha->vha_tgt.qla_tgt; > @@ -5083,6 +5085,8 @@ static int __qlt_send_busy(struct scsi_qla_host *vha, > > qlt_incr_num_pend_cmds(vha); > INIT_LIST_HEAD(&cmd->cmd_list); > + init_completion(&cmd->write_pending_abort_comp); > + > memcpy(&cmd->atio, atio, sizeof(*atio)); > > cmd->tgt = vha->vha_tgt.qla_tgt; > diff --git a/drivers/scsi/qla2xxx/qla_target.h b/drivers/scsi/qla2xxx/qla_target.h > index d644202..939e93c5 100644 > --- a/drivers/scsi/qla2xxx/qla_target.h > +++ b/drivers/scsi/qla2xxx/qla_target.h > @@ -868,6 +868,7 @@ struct qla_tgt_cmd { > unsigned char sense_buffer[TRANSPORT_SENSE_BUFFER]; > > spinlock_t cmd_lock; > + struct completion write_pending_abort_comp; > /* to save extra sess dereferences */ > unsigned int conf_compl_supported:1; > unsigned int sg_mapped:1; > diff --git a/drivers/scsi/qla2xxx/tcm_qla2xxx.c b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > index 75aeb9f..8c1bf9b 100644 > --- a/drivers/scsi/qla2xxx/tcm_qla2xxx.c > +++ b/drivers/scsi/qla2xxx/tcm_qla2xxx.c > @@ -415,6 +415,7 @@ static int tcm_qla2xxx_write_pending(struct se_cmd *se_cmd) > > static int tcm_qla2xxx_write_pending_status(struct se_cmd *se_cmd) > { > + struct qla_tgt_cmd *cmd = container_of(se_cmd, struct qla_tgt_cmd, se_cmd); > unsigned long flags; > /* > * Check for WRITE_PENDING status to determine if we need to wait for > @@ -424,8 +425,7 @@ static int tcm_qla2xxx_write_pending_status(struct se_cmd *se_cmd) > if (se_cmd->t_state == TRANSPORT_WRITE_PENDING || > se_cmd->t_state == TRANSPORT_COMPLETE_QF_WP) { > spin_unlock_irqrestore(&se_cmd->t_state_lock, flags); > - wait_for_completion_timeout(&se_cmd->t_transport_stop_comp, > - 50); > + wait_for_completion(&cmd->write_pending_abort_comp); > return 0; > } > spin_unlock_irqrestore(&se_cmd->t_state_lock, flags); > @@ -501,7 +501,6 @@ static int tcm_qla2xxx_handle_cmd(scsi_qla_host_t *vha, struct qla_tgt_cmd *cmd, > static void tcm_qla2xxx_handle_data_work(struct work_struct *work) > { > struct qla_tgt_cmd *cmd = container_of(work, struct qla_tgt_cmd, work); > - unsigned long flags; > > /* > * Ensure that the complete FCP WRITE payload has been received. > @@ -509,17 +508,6 @@ static void tcm_qla2xxx_handle_data_work(struct work_struct *work) > */ > cmd->cmd_in_wq = 0; > > - spin_lock_irqsave(&cmd->cmd_lock, flags); > - cmd->data_work = 1; > - if (cmd->aborted) { > - cmd->data_work_free = 1; > - spin_unlock_irqrestore(&cmd->cmd_lock, flags); > - > - tcm_qla2xxx_free_cmd(cmd); > - return; > - } > - spin_unlock_irqrestore(&cmd->cmd_lock, flags); > - > cmd->vha->tgt_counters.qla_core_ret_ctio++; > if (!cmd->write_data_transferred) { > /* > @@ -527,7 +515,7 @@ static void tcm_qla2xxx_handle_data_work(struct work_struct *work) > * waiting upon completion in tcm_qla2xxx_write_pending_status() > */ > if (cmd->se_cmd.transport_state & CMD_T_ABORTED) { > - complete(&cmd->se_cmd.t_transport_stop_comp); > + complete(&cmd->write_pending_abort_comp); > return; > } > > @@ -753,31 +741,13 @@ static void tcm_qla2xxx_queue_tm_rsp(struct se_cmd *se_cmd) > qlt_xmit_tm_rsp(mcmd); > } > > -#define DATA_WORK_NOT_FREE(_cmd) (_cmd->data_work && !_cmd->data_work_free) > static void tcm_qla2xxx_aborted_task(struct se_cmd *se_cmd) > { > struct qla_tgt_cmd *cmd = container_of(se_cmd, > struct qla_tgt_cmd, se_cmd); > - unsigned long flags; > > if (qlt_abort_cmd(cmd)) > return; > - > - spin_lock_irqsave(&cmd->cmd_lock, flags); > - if ((cmd->state == QLA_TGT_STATE_NEW)|| > - ((cmd->state == QLA_TGT_STATE_DATA_IN) && > - DATA_WORK_NOT_FREE(cmd))) { > - cmd->data_work_free = 1; > - spin_unlock_irqrestore(&cmd->cmd_lock, flags); > - /* > - * cmd has not reached fw, Use this trigger to free it. > - */ > - tcm_qla2xxx_free_cmd(cmd); > - return; > - } > - spin_unlock_irqrestore(&cmd->cmd_lock, flags); > - return; > - > } > > static void tcm_qla2xxx_clear_sess_lookup(struct tcm_qla2xxx_lport *, > Patch looks good. Acked-by: Himanshu Madhani <himanshu.madhani@xxxxxxxxxx> Thanks, - Himanshu -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html