qla2xxx: shoud ensure no *io done* after qla2xxx_eh_abort returning SUCCESS to avoid race between *io timeout* and *io done*. LLDD driver should ensure that there is no *io done* after eh_abort (*qla2xxx_eh_abort*) returning SUCCESS to avoid the race between *io timeout* and *io done*. But qla2xxx driver can not guarantee that. If *io done* occurs when qla2xxx_eh_abort is in process, it may return SUCCESS without actually aborting the cmd. In that case, the race between *io timeout* and *io done* appears. Although there is REQ_ATOM_COMPLETE flag trying to avoid the race, there is also race possibility that may result in crash. The following is one of the race cases, CPU 1 CPU2 scsi_try_to_abort_cmd()-> qla2xxx_eh_abort() *io done*->qla2x00_sp_compl()-> qla2x00_sp_free_dma() return SUCCESS-> *retry*->scsi_queue_insert()-> blk_clear_rq_complete() scsi_done()-> blk_complete_request() In this race case, because the REQ_ATOM_COMPLETE flag has been clear in blk_clear_rq_complete, the *io done* will try to complete the IO request that has already been requeued. When the requeued IO completes again, another *io done* of the same IO request will start again. That means the same IO request will be finished twice, which will result in double releasing the resources of the request, then lead to crash. So when *io done* happens during qla2xxx_eh_abort, we should wait there till *io done* finishs, ensuring no *io done* after qla2xxx_eh_abort returning SUCCESS, which avoids race between *io timeout* and *io done*. Signed-off-by: Jiang Biao <jiang.biao2@xxxxxxxxxx> Signed-off-by: Tan Hu <tan.hu@xxxxxxxxxx> Reviewed-by: Cai Qu <cai.qu@xxxxxxxxxx> diff -uprN scsi/qla_os.c scsi_new/qla_os.c --- scsi/qla_os.c 2015-07-31 19:39:06.000000000 +0800 +++ scsi_new/qla_os.c 2015-07-31 19:46:20.000000000 +0800 @@ -939,8 +939,15 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd) int rval, wait = 0; struct qla_hw_data *ha = vha->hw; - if (!CMD_SP(cmd)) + if (!CMD_SP(cmd)) { + /* + * Wait until io done finishs to avoid race between + * io timeout and io done. + */ + spin_lock_irqsave(&ha->hardware_lock, flags); + spin_unlock_irqrestore(&ha->hardware_lock, flags); return SUCCESS; + } ret = fc_block_scsi_eh(cmd); if (ret != 0) @@ -996,8 +1003,15 @@ qla2xxx_eh_abort(struct scsi_cmnd *cmd) spin_unlock_irqrestore(&ha->hardware_lock, flags); /* Did the command return during mailbox execution? */ - if (ret == FAILED && !CMD_SP(cmd)) + if (ret == FAILED && !CMD_SP(cmd)) { + /* + * Wait until io done finishs to avoid race between + * io timeout and io done. + */ + spin_lock_irqsave(&ha->hardware_lock, flags); + spin_unlock_irqrestore(&ha->hardware_lock, flags); ret = SUCCESS; + } /* Wait for the command to be returned. */ if (wait) { ��.n��������+%������w��{.n�����{������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f