On Fri, Jul 20, 2018 at 11:24:45AM -0600, Keith Busch wrote: > My patch restores the state that scsi had in 4.17. It still has that > gap that may lose requests forever when the scsi LLD always returns > BLK_EH_RESET_TIMER (see virtio-scsi, for example). That gap existed prior, > so that's not new with my patch. Maybe we can fix that with a slight > modification to my previous patch. It looks like SCSI really wants to > block completions only when it hands off the command to the error handler, > so we don't need to have the inflight -> compete -> inflight transition, > and the following is all that's needed: Btw, one thing we should do in blk-mq and scsi is to make the time optional. If the blk_mq driver doesn't even have a timeout structure there is no point in timing out requests and enter the timeout handler ever. Same for those scsi drivers always returning BLK_EH_RESET_TIMER. Whether never having timeouts is a good idea is a different discussion, but as long as we have such drivers we should handle them somewhat sane. > --- > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 8932ae81a15a..902c30d3c0ed 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -296,6 +296,8 @@ enum blk_eh_timer_return scsi_times_out(struct request *req) > rtn = host->hostt->eh_timed_out(scmd); > > if (rtn == BLK_EH_DONE) { > + if (req->q->mq_ops && blk_mq_mark_complete(req)) > + return rtn; This looks pretty sensible to me as a band-aid. It just needs a very detailed comment explaining what is going on here.