Hi Bart, On Mon, 2019-12-09 at 10:02 -0800, Bart Van Assche wrote: > Since the SCSI core does not reuse the tag of the SCSI command that > is > being aborted by .eh_abort() before .eh_abort() has finished it is > not > necessary to check from inside that callback whether or not the SCSI > command > has already completed. Instead, rely on the firmware to return an > error code > when attempting to abort a command that has already completed. > Additionally, > rely on the firmware to return an error code when attempting to abort > an > already aborted command. Do we have evidence that the firmware can truly be relied upon in this respect? It's at least not impossible that the FW (or some version of it) relies on the driver not trying to abort commands that have been aborted or completed already, and crashes if that assumption is violated. > > In qla2x00_abort_srb(), use blk_mq_request_started() instead of > sp->completed and sp->aborted. See below. > This patch eliminates several race conditions triggered by the > removed member > variables. > > Cc: Quinn Tran <qutran@xxxxxxxxxxx> > Cc: Martin Wilck <mwilck@xxxxxxxx> > Cc: Daniel Wagner <dwagner@xxxxxxx> > Cc: Roman Bolshakov <r.bolshakov@xxxxxxxxx> > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > --- > drivers/scsi/qla2xxx/qla_def.h | 3 --- > drivers/scsi/qla2xxx/qla_isr.c | 5 ----- > drivers/scsi/qla2xxx/qla_os.c | 15 ++------------- > 3 files changed, 2 insertions(+), 21 deletions(-) > > diff --git a/drivers/scsi/qla2xxx/qla_def.h > b/drivers/scsi/qla2xxx/qla_def.h > index 460f443f6471..ab7424318ee8 100644 > --- a/drivers/scsi/qla2xxx/qla_def.h > +++ b/drivers/scsi/qla2xxx/qla_def.h > @@ -597,9 +597,6 @@ typedef struct srb { > struct fc_port *fcport; > struct scsi_qla_host *vha; > unsigned int start_timer:1; > - unsigned int abort:1; > - unsigned int aborted:1; > - unsigned int completed:1; > > uint32_t handle; > uint16_t flags; > diff --git a/drivers/scsi/qla2xxx/qla_isr.c > b/drivers/scsi/qla2xxx/qla_isr.c > index 2601d7673c37..721a8d83e350 100644 > --- a/drivers/scsi/qla2xxx/qla_isr.c > +++ b/drivers/scsi/qla2xxx/qla_isr.c > @@ -2487,11 +2487,6 @@ qla2x00_status_entry(scsi_qla_host_t *vha, > struct rsp_que *rsp, void *pkt) > return; > } > > - if (sp->abort) > - sp->aborted = 1; > - else > - sp->completed = 1; > - > if (sp->cmd_type != TYPE_SRB) { > req->outstanding_cmds[handle] = NULL; > ql_dbg(ql_dbg_io, vha, 0x3015, > diff --git a/drivers/scsi/qla2xxx/qla_os.c > b/drivers/scsi/qla2xxx/qla_os.c > index 145ea93206f0..2231d99d311b 100644 > --- a/drivers/scsi/qla2xxx/qla_os.c > +++ b/drivers/scsi/qla2xxx/qla_os.c > @@ -1253,17 +1253,6 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd) > return SUCCESS; > > spin_lock_irqsave(qpair->qp_lock_ptr, flags); > - if (sp->completed) { > - spin_unlock_irqrestore(qpair->qp_lock_ptr, flags); > - return SUCCESS; > - } > - > - if (sp->abort || sp->aborted) { > - spin_unlock_irqrestore(qpair->qp_lock_ptr, flags); > - return FAILED; > - } > - > - sp->abort = 1; > sp->comp = ∁ > spin_unlock_irqrestore(qpair->qp_lock_ptr, flags); > > @@ -1696,6 +1685,7 @@ static void qla2x00_abort_srb(struct qla_qpair > *qp, srb_t *sp, const int res, > DECLARE_COMPLETION_ONSTACK(comp); > scsi_qla_host_t *vha = qp->vha; > struct qla_hw_data *ha = vha->hw; > + struct scsi_cmnd *cmd = GET_CMD_SP(sp); > int rval; > bool ret_cmd; > uint32_t ratov_j; > @@ -1717,7 +1707,6 @@ static void qla2x00_abort_srb(struct qla_qpair > *qp, srb_t *sp, const int res, > } > > sp->comp = ∁ > - sp->abort = 1; > spin_unlock_irqrestore(qp->qp_lock_ptr, *flags); > > rval = ha->isp_ops->abort_command(sp); > @@ -1741,7 +1730,7 @@ static void qla2x00_abort_srb(struct qla_qpair > *qp, srb_t *sp, const int res, > } > > spin_lock_irqsave(qp->qp_lock_ptr, *flags); > - if (ret_cmd && (!sp->completed || !sp->aborted)) > + if (ret_cmd && blk_mq_request_started(cmd->request)) > sp->done(sp, res); > } else { > sp->done(sp, res); blk_mq_request_started() returns true for requests in MQ_RQ_COMPLETE state. Is this really an equivalent condition? That said, the condition in the current code is sort of strange, as it's equivalent to !(sp->completed && sp->aborted). I'm wondering what it means if a command is both completed and aborted. Naïvely thinking (and inferring from the current code) this condition could never be met, and thus its negation would hold for every command. Perhaps Quinn meant "!(sp->completed || sp->aborted)" ? Regards, Martin