Patch "scsi: core: Kick the requeue list after inserting when flushing" has been added to the 6.7-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Fri, 26 Jan 2024 09:05:39 -0500

This is a note to let you know that I've just added the patch titled

    scsi: core: Kick the requeue list after inserting when flushing

to the 6.7-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     scsi-core-kick-the-requeue-list-after-inserting-when.patch
and it can be found in the queue-6.7 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit df6aaddf93455da5a777f3a0c2a4b74cd2e5bd72
Author: Niklas Cassel <cassel@xxxxxxxxxx>
Date:   Thu Jan 11 13:05:32 2024 +0100

    scsi: core: Kick the requeue list after inserting when flushing
    
    [ Upstream commit 6df0e077d76bd144c533b61d6182676aae6b0a85 ]
    
    When libata calls ata_link_abort() to abort all ata queued commands, it
    calls blk_abort_request() on the SCSI command representing each QC.
    
    This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() for
    each SCSI command.
    
    scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds the
    command to shost->eh_cmd_q.
    
    This will wake up the SCSI EH, and eventually the libata EH strategy
    handler will be called, which calls scsi_eh_flush_done_q() to either flush
    retry or flush finish each failed command.
    
    The commands that are flush retried by scsi_eh_flush_done_q() are done so
    using scsi_queue_insert().
    
    Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
    necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with the
    second argument set to true, indicating that it should always kick/run the
    requeue list after inserting.
    
    After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if
    necessary"), __scsi_queue_insert() does not kick/run the requeue list after
    inserting, if the current SCSI host state is recovery (which is the case in
    the libata example above).
    
    This optimization is probably fine in most cases, as I can only assume that
    most often someone will eventually kick/run the queues.
    
    However, that is not the case for scsi_eh_flush_done_q(), where we can see
    that the request gets inserted to the requeue list, but the queue is never
    started after the request has been inserted, leading to the block layer
    waiting for the completion of command that never gets to run.
    
    Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host
    state is most likely always in recovery when this function is called.
    
    Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after
    inserting a flush retry command, so that scsi_eh_flush_done_q() keeps the
    same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick the
    requeue list if necessary").
    
    Simple reproducer for the libata example above:
    $ hdparm -Y /dev/sda
    $ echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete
    
    Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
    Reported-by: Kevin Locke <kevin@xxxxxxxxxxxxxxx>
    Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@xxxxxxxxxxxxxxx/
    Signed-off-by: Niklas Cassel <cassel@xxxxxxxxxx>
    Link: https://lore.kernel.org/r/20240111120533.3612509-1-cassel@xxxxxxxxxx
    Reviewed-by: Bart Van Assche <bvanassche@xxxxxxx>
    Reviewed-by: Damien Le Moal <dlemoal@xxxxxxxxxx>
    Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 1223d34c04da..d983f4a0e9f1 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -2196,15 +2196,18 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 	struct scsi_cmnd *scmd, *next;
 
 	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
+		struct scsi_device *sdev = scmd->device;
+
 		list_del_init(&scmd->eh_entry);
-		if (scsi_device_online(scmd->device) &&
-		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd) &&
-			scsi_eh_should_retry_cmd(scmd)) {
+		if (scsi_device_online(sdev) && !scsi_noretry_cmd(scmd) &&
+		    scsi_cmd_retry_allowed(scmd) &&
+		    scsi_eh_should_retry_cmd(scmd)) {
 			SCSI_LOG_ERROR_RECOVERY(3,
 				scmd_printk(KERN_INFO, scmd,
 					     "%s: flush retry cmd\n",
 					     current->comm));
 				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+				blk_mq_kick_requeue_list(sdev->request_queue);
 		} else {
 			/*
 			 * If just we got sense for the device (called