On Wed, Nov 14, 2018 at 11:00:18AM -0700, Keith Busch wrote: > On Wed, Nov 14, 2018 at 09:51:36AM -0800, Bart Van Assche wrote: > > Regarding this patch: I think this patch introduces a subtle but severe bug > > in the SCSI core, namely that if an abort is processed concurrently with > > request completion with "fake timeout" enabled that the abort is ignored. > > That requires the following occur concurrently: > > 1. A real completion > 2. A real timeout > 3. A fake timeout > > That can't happen on a production kernel, and highly improbable > on the fake one. We can still fix it by having scsi timeout return > BLK_EH_RESET_TIMER in this case. I didn't like adding code just to work > around error injection, but there isn't a good alternative at the moment. So do this instead: --8<--- diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index ff372b335ced..d343024e732a 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -199,9 +199,6 @@ scsi_abort_command(struct scsi_cmnd *scmd) return FAILED; } - if (test_and_set_bit(__SCMD_COMPLETE, &scmd->flags)) - return SUCCESS; - spin_lock_irqsave(shost->host_lock, flags); if (shost->eh_deadline != -1 && !shost->last_reset) shost->last_reset = jiffies; @@ -299,6 +296,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) rtn = host->hostt->eh_timed_out(scmd); if (rtn == BLK_EH_DONE) { + if (test_and_set_bit(__SCMD_COMPLETE, &scmd->flags)) + return BLK_EH_RESET_TIMER; if (scsi_abort_command(scmd) != SUCCESS) { set_host_byte(scmd, DID_TIME_OUT); scsi_eh_scmd_add(scmd); -->8---