Patch "scsi: core: sysfs: Fix hang when device state is set via sysfs" has been added to the 5.4-stable tree

Sasha Levin <sashal@xxxxxxxxxx> · Sun, 21 Nov 2021 18:20:15 -0500

This is a note to let you know that I've just added the patch titled

    scsi: core: sysfs: Fix hang when device state is set via sysfs

to the 5.4-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     scsi-core-sysfs-fix-hang-when-device-state-is-set-vi.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 33d6411e11d4515ad5d88e13187e0b8fe86981c0
Author: Mike Christie <michael.christie@xxxxxxxxxx>
Date:   Fri Nov 5 17:10:48 2021 -0500

    scsi: core: sysfs: Fix hang when device state is set via sysfs
    
    [ Upstream commit 4edd8cd4e86dd3047e5294bbefcc0a08f66a430f ]
    
    This fixes a regression added with:
    
    commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
    offlinining device")
    
    The problem is that after iSCSI recovery, iscsid will call into the kernel
    to set the dev's state to running, and with that patch we now call
    scsi_rescan_device() with the state_mutex held. If the SCSI error handler
    thread is just starting to test the device in scsi_send_eh_cmnd() then it's
    going to try to grab the state_mutex.
    
    We are then stuck, because when scsi_rescan_device() tries to send its I/O
    scsi_queue_rq() calls -> scsi_host_queue_ready() -> scsi_host_in_recovery()
    which will return true (the host state is still in recovery) and I/O will
    just be requeued. scsi_send_eh_cmnd() will then never be able to grab the
    state_mutex to finish error handling.
    
    To prevent the deadlock move the rescan-related code to after we drop the
    state_mutex.
    
    This also adds a check for if we are already in the running state. This
    prevents extra scans and helps the iscsid case where if the transport class
    has already onlined the device during its recovery process then we don't
    need userspace to do it again plus possibly block that daemon.
    
    Link: https://lore.kernel.org/r/20211105221048.6541-3-michael.christie@xxxxxxxxxx
    Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after offlinining device")
    Cc: Bart Van Assche <bvanassche@xxxxxxx>
    Cc: lijinlin <lijinlin3@xxxxxxxxxx>
    Cc: Wu Bo <wubo40@xxxxxxxxxx>
    Reviewed-by: Lee Duncan <lduncan@xxxxxxxx>
    Reviewed-by: Wu Bo <wubo40@xxxxxxxxxx>
    Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
    Signed-off-by: Martin K. Petersen <martin.petersen@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 12064ce76777d..16432d42a50aa 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -776,6 +776,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
 	int i, ret;
 	struct scsi_device *sdev = to_scsi_device(dev);
 	enum scsi_device_state state = 0;
+	bool rescan_dev = false;
 
 	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
 		const int len = strlen(sdev_states[i].name);
@@ -794,20 +795,27 @@ store_state_field(struct device *dev, struct device_attribute *attr,
 	}
 
 	mutex_lock(&sdev->state_mutex);
-	ret = scsi_device_set_state(sdev, state);
-	/*
-	 * If the device state changes to SDEV_RUNNING, we need to
-	 * run the queue to avoid I/O hang, and rescan the device
-	 * to revalidate it. Running the queue first is necessary
-	 * because another thread may be waiting inside
-	 * blk_mq_freeze_queue_wait() and because that call may be
-	 * waiting for pending I/O to finish.
-	 */
-	if (ret == 0 && state == SDEV_RUNNING) {
+	if (sdev->sdev_state == SDEV_RUNNING && state == SDEV_RUNNING) {
+		ret = count;
+	} else {
+		ret = scsi_device_set_state(sdev, state);
+		if (ret == 0 && state == SDEV_RUNNING)
+			rescan_dev = true;
+	}
+	mutex_unlock(&sdev->state_mutex);
+
+	if (rescan_dev) {
+		/*
+		 * If the device state changes to SDEV_RUNNING, we need to
+		 * run the queue to avoid I/O hang, and rescan the device
+		 * to revalidate it. Running the queue first is necessary
+		 * because another thread may be waiting inside
+		 * blk_mq_freeze_queue_wait() and because that call may be
+		 * waiting for pending I/O to finish.
+		 */
 		blk_mq_run_hw_queues(sdev->request_queue, true);
 		scsi_rescan_device(dev);
 	}
-	mutex_unlock(&sdev->state_mutex);
 
 	return ret == 0 ? count : -EINVAL;
 }