On 1/11/24 21:05, Niklas Cassel wrote: > When libata calls ata_link_abort() to abort all ata queued commands, > it calls blk_abort_request() on the SCSI command representing each QC. > > This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() > for each SCSI command. > > scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds > the command to shost->eh_cmd_q. > > This will wake up the SCSI EH, and eventually the libata EH strategy > handler will be called, which calls scsi_eh_flush_done_q() to either > flush retry or flush finish each failed command. > > The commands that are flush retried by scsi_eh_flush_done_q() are > done so using scsi_queue_insert(). > > Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if > necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with > the second argument set to true, indicating that it should always > kick/run the requeue list after inserting. > > After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if > necessary"), __scsi_queue_insert() does not kick/run the requeue list > after inserting, if the current SCSI host state is recovery (which is > the case in the libata example above). > > This optimization is probably fine in most cases, as I can only assume > that most often someone will eventually kick/run the queues. > > However, that is not the case for scsi_eh_flush_done_q(), where we can > see that the request gets inserted to the requeue list, but the queue is > never started after the request has been inserted, leading to the block > layer waiting for the completion of command that never gets to run. > > Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host > state is most likely always in recovery when this function is called. > > Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after > inserting a flush retry command, so that scsi_eh_flush_done_q() keeps > the same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick > the requeue list if necessary"). > > Simple reproducer for the libata example above: > $ hdparm -Y /dev/sda > $ echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete > > Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary") > Reported-by: Kevin Locke <kevin@xxxxxxxxxxxxxxx> > Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@xxxxxxxxxxxxxxx/ > Signed-off-by: Niklas Cassel <cassel@xxxxxxxxxx> > --- > drivers/scsi/scsi_error.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > index 1223d34c04da..d983f4a0e9f1 100644 > --- a/drivers/scsi/scsi_error.c > +++ b/drivers/scsi/scsi_error.c > @@ -2196,15 +2196,18 @@ void scsi_eh_flush_done_q(struct list_head *done_q) > struct scsi_cmnd *scmd, *next; > > list_for_each_entry_safe(scmd, next, done_q, eh_entry) { > + struct scsi_device *sdev = scmd->device; > + > list_del_init(&scmd->eh_entry); > - if (scsi_device_online(scmd->device) && > - !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd) && > - scsi_eh_should_retry_cmd(scmd)) { > + if (scsi_device_online(sdev) && !scsi_noretry_cmd(scmd) && > + scsi_cmd_retry_allowed(scmd) && > + scsi_eh_should_retry_cmd(scmd)) { > SCSI_LOG_ERROR_RECOVERY(3, > scmd_printk(KERN_INFO, scmd, > "%s: flush retry cmd\n", > current->comm)); > scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY); > + blk_mq_kick_requeue_list(sdev->request_queue); > } else { > /* > * If just we got sense for the device (called Looks good to me. Reviewed-by: Damien Le Moal <dlemoal@xxxxxxxxxx> -- Damien Le Moal Western Digital Research