On 4/25/21 1:57 AM, Ming Lei wrote: > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index c289991ffaed..7cbaee282b6d 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1568,7 +1568,11 @@ static void scsi_mq_done(struct scsi_cmnd *cmd) > if (unlikely(test_and_set_bit(SCMD_STATE_COMPLETE, &cmd->state))) > return; > trace_scsi_dispatch_cmd_done(cmd); > - blk_mq_complete_request(cmd->request); > + > + if (unlikely(host_byte(cmd->result) != DID_OK)) > + blk_mq_complete_request_locally(cmd->request); > + else > + blk_mq_complete_request(cmd->request); > } This change is so tricky that it deserves a comment. An even better approach would be *not* to export blk_mq_complete_request_locally() from the block layer to block drivers and instead modify the block layer such that it completes a request on the same CPU if request completion happens from inside the context of a tag iteration function. That would save driver writers the trouble of learning yet another block layer API. Bart.