On 1/24/17, 6:59 AM, "Mauricio Faria de Oliveira" <mauricfo@xxxxxxxxxxxxxxxxxx> wrote: >Hi Bart, > >First of all, sorry for the new bug; I didn't realize the pointer could >be NULL at this scenario. > >On 01/23/2017 02:34 PM, Bart Van Assche wrote: >> @@ -1624,7 +1627,8 @@ qla2x00_abort_all_cmds(scsi_qla_host_t *vha, int res) >> */ >> sp_get(sp); >> spin_unlock_irqrestore(&ha->hardware_lock, flags); >> - qla2xxx_eh_abort(GET_CMD_SP(sp)); >> + if (scmd) >> + qla2xxx_eh_abort(scmd); >> spin_lock_irqsave(&ha->hardware_lock, flags); >> } > >Now, this chunk has a problem with reference counting (and unnecessary >spin-locking), which we can avoid by simply moving up this NULL check. > >The call to sp_get() increments the sp->ref_count, but if you skip the >call to qla2xxx_eh_abort() you don't get the decrement from the call to >sp->done() at abort handling from ISR, e.g., qla24xx_abort_iocb_entry(). >[or if the command completed successfully between issue/complete abort, >at the completion from ISR, e.g., qla2x00_process_completed_request().] > >The sp->done() call just below this chunk was supposed to drop the >initial reference [set at qla2xxx_queuecommand()] at a time we did >not call qla2xxx_eh_abort() yet... but now that we __may__ call it >(and get that sp->done() call from the ISR abort handling), we need >to only increment it if we're going to drop it. > >That should be resolved with this slight change to your patch >(which also helps w/ the spin-locking). What do you/others think? > >diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c >index 0a000ecf0881..a17cb63b3fd5 100644 >--- a/drivers/scsi/qla2xxx/qla_os.c >+++ b/drivers/scsi/qla2xxx/qla_os.c >@@ -1600,6 +1600,7 @@ uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha) > srb_t *sp; > struct qla_hw_data *ha = vha->hw; > struct req_que *req; >+ struct scsi_cmnd *scmd; > > qlt_host_reset_handler(ha); > >@@ -1613,10 +1614,12 @@ uint32_t qla2x00_isp_reg_stat(struct qla_hw_data >*ha) > for (cnt = 1; cnt < req->num_outstanding_cmds; cnt++) { > sp = req->outstanding_cmds[cnt]; > if (sp) { >+ scmd = GET_CMD_SP(sp); >+ > /* Don't abort commands in adapter >during EEH > * recovery as it's not >accessible/responding. > */ >- if (!ha->flags.eeh_busy) { >+ if (scmd && !ha->flags.eeh_busy) { > /* Get a reference to the sp >and drop the lock. > * The reference ensures this >sp->done() call > * - and not the call in >qla2xxx_eh_abort() - >@@ -1624,7 +1627,7 @@ uint32_t qla2x00_isp_reg_stat(struct qla_hw_data *ha) > */ > sp_get(sp); > >spin_unlock_irqrestore(&ha->hardware_lock, flags); >- qla2xxx_eh_abort(GET_CMD_SP(sp)); >+ qla2xxx_eh_abort(scmd); > >spin_lock_irqsave(&ha->hardware_lock, flags); > } > req->outstanding_cmds[cnt] = NULL; > > >Signed-off-by: Mauricio Faria de Oliveira <mauricfo@xxxxxxxxxxxxxxxxxx> > > >-- >Mauricio Faria de Oliveira >IBM Linux Technology Center This is more appropriate fix. Looks good. ��.n��������+%������w��{.n�����������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f