Patch "scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()" has been added to the 5.16-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 23 Jan 2022 10:08:57 -0500

This is a note to let you know that I've just added the patch titled

    scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()

to the 5.16-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     scsi-block-pm-always-set-request-queue-runtime-activ.patch
and it can be found in the queue-5.16 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 045ae3b8a69bbbff825e250b80f3a8b86403e54e
Author: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
Date:   Mon Dec 20 19:21:26 2021 +0800

    scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()
    
    [ Upstream commit 6e1fcab00a23f7fe9f4fe9704905a790efa1eeab ]
    
    John Garry reported a deadlock that occurs when trying to access a
    runtime-suspended SATA device.  For obscure reasons, the rescan procedure
    causes the link to be hard-reset, which disconnects the device.
    
    The rescan tries to carry out a runtime resume when accessing the device.
    scsi_rescan_device() holds the SCSI device lock and won't release it until
    it can put commands onto the device's block queue.  This can't happen until
    the queue is successfully runtime-resumed or the device is unregistered.
    But the runtime resume fails because the device is disconnected, and
    __scsi_remove_device() can't do the unregistration because it can't get the
    device lock.
    
    The best way to resolve this deadlock appears to be to allow the block
    queue to start running again even after an unsuccessful runtime resume.
    The idea is that the driver or the SCSI error handler will need to be able
    to use the queue to resolve the runtime resume failure.
    
    This patch removes the err argument to blk_post_runtime_resume() and makes
    the routine act as though the resume was successful always.  This fixes the
    deadlock.
    
    Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@xxxxxxxxxxxxx
    Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove")
    Reported-and-tested-by: John Garry <john.garry@xxxxxxxxxx>
    Reviewed-by: Bart Van Assche <bvanassche@xxxxxxx>
    Signed-off-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
    Signed-off-by: Xiang Chen <chenxiang66@xxxxxxxxxxxxx>
    Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/block/blk-pm.c b/block/blk-pm.c
index 17bd020268d42..2dad62cc15727 100644
--- a/block/blk-pm.c
+++ b/block/blk-pm.c
@@ -163,27 +163,19 @@ EXPORT_SYMBOL(blk_pre_runtime_resume);
 /**
  * blk_post_runtime_resume - Post runtime resume processing
  * @q: the queue of the device
- * @err: return value of the device's runtime_resume function
  *
  * Description:
- *    Update the queue's runtime status according to the return value of the
- *    device's runtime_resume function. If the resume was successful, call
- *    blk_set_runtime_active() to do the real work of restarting the queue.
+ *    For historical reasons, this routine merely calls blk_set_runtime_active()
+ *    to do the real work of restarting the queue.  It does this regardless of
+ *    whether the device's runtime-resume succeeded; even if it failed the
+ *    driver or error handler will need to communicate with the device.
  *
  *    This function should be called near the end of the device's
  *    runtime_resume callback.
  */
-void blk_post_runtime_resume(struct request_queue *q, int err)
+void blk_post_runtime_resume(struct request_queue *q)
 {
-	if (!q->dev)
-		return;
-	if (!err) {
-		blk_set_runtime_active(q);
-	} else {
-		spin_lock_irq(&q->queue_lock);
-		q->rpm_status = RPM_SUSPENDED;
-		spin_unlock_irq(&q->queue_lock);
-	}
+	blk_set_runtime_active(q);
 }
 EXPORT_SYMBOL(blk_post_runtime_resume);
 
@@ -201,7 +193,7 @@ EXPORT_SYMBOL(blk_post_runtime_resume);
  * runtime PM status and re-enable peeking requests from the queue. It
  * should be called before first request is added to the queue.
  *
- * This function is also called by blk_post_runtime_resume() for successful
+ * This function is also called by blk_post_runtime_resume() for
  * runtime resumes.  It does everything necessary to restart the queue.
  */
 void blk_set_runtime_active(struct request_queue *q)
diff --git a/drivers/scsi/scsi_pm.c b/drivers/scsi/scsi_pm.c
index b5a858c29488a..f06ca9d2a597d 100644
--- a/drivers/scsi/scsi_pm.c
+++ b/drivers/scsi/scsi_pm.c
@@ -181,7 +181,7 @@ static int sdev_runtime_resume(struct device *dev)
 	blk_pre_runtime_resume(sdev->request_queue);
 	if (pm && pm->runtime_resume)
 		err = pm->runtime_resume(dev);
-	blk_post_runtime_resume(sdev->request_queue, err);
+	blk_post_runtime_resume(sdev->request_queue);
 
 	return err;
 }
diff --git a/include/linux/blk-pm.h b/include/linux/blk-pm.h
index b80c65aba2493..2580e05a8ab67 100644
--- a/include/linux/blk-pm.h
+++ b/include/linux/blk-pm.h
@@ -14,7 +14,7 @@ extern void blk_pm_runtime_init(struct request_queue *q, struct device *dev);
 extern int blk_pre_runtime_suspend(struct request_queue *q);
 extern void blk_post_runtime_suspend(struct request_queue *q, int err);
 extern void blk_pre_runtime_resume(struct request_queue *q);
-extern void blk_post_runtime_resume(struct request_queue *q, int err);
+extern void blk_post_runtime_resume(struct request_queue *q);
 extern void blk_set_runtime_active(struct request_queue *q);
 #else
 static inline void blk_pm_runtime_init(struct request_queue *q,