Re: [PATCH 2/2] scsi: Fix hang when device state is set via sysfs

James Bottomley <jejb@xxxxxxxxxxxxx> · Fri, 19 Nov 2021 08:35:06 -0500

On Fri, 2021-11-05 at 17:10 -0500, Mike Christie wrote:
> This fixes a regression added with:
> 
> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
> offlinining device")
> 
> The problem is that after iSCSI recovery, iscsid will call into the
> kernel
> to set the dev's state to running, and with that patch we now call
> scsi_rescan_device with the state_mutex held. If the scsi error
> handler
> thread is just starting to test the device in scsi_send_eh_cmnd then
> it's
> going to try to grab the state_mutex.
> 
> We are then stuck, because when scsi_rescan_device tries to send its
> IO
> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
> which will return true (the host state is still in recovery) and IO
> will
> just be requeued. scsi_send_eh_cmnd will then never be able to grab
> the
> state_mutex to finish error handling.
> 
> To prevent the deadlock this moves the rescan related code to after
> we
> drop the state_mutex.
> 
> This also adds a check for if we are already in the running state.
> This
> prevents extra scans and helps the iscsid case where if the transport
> class has already onlined the device during it's recovery process
> then we
> don't need userspace to do it again plus possibly block that daemon.
> 
> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
> offlinining device")
> Cc: Bart Van Assche <bvanassche@xxxxxxx>
> Cc: lijinlin <lijinlin3@xxxxxxxxxx>
> Cc: Wu Bo <wubo40@xxxxxxxxxx>
> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
> ---
>  drivers/scsi/scsi_sysfs.c | 30 +++++++++++++++++++-----------
>  1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index a35841b34bfd..53e23a7bc0d3 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -797,6 +797,7 @@ store_state_field(struct device *dev, struct
> device_attribute *attr,
>  	int i, ret;
>  	struct scsi_device *sdev = to_scsi_device(dev);
>  	enum scsi_device_state state = 0;
> +	bool rescan_dev = false;
>  
>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>  		const int len = strlen(sdev_states[i].name);
> @@ -815,20 +816,27 @@ store_state_field(struct device *dev, struct
> device_attribute *attr,
>  	}
>  
>  	mutex_lock(&sdev->state_mutex);
> -	ret = scsi_device_set_state(sdev, state);
> -	/*
> -	 * If the device state changes to SDEV_RUNNING, we need to
> -	 * run the queue to avoid I/O hang, and rescan the device
> -	 * to revalidate it. Running the queue first is necessary
> -	 * because another thread may be waiting inside
> -	 * blk_mq_freeze_queue_wait() and because that call may be
> -	 * waiting for pending I/O to finish.
> -	 */
> -	if (ret == 0 && state == SDEV_RUNNING) {
> +	if (sdev->sdev_state == SDEV_RUNNING && state == SDEV_RUNNING)
> {
> +		ret = count;

This looks wrong because of this

[...]
>  	return ret == 0 ? count : -EINVAL;

Don't we now return EINVAL on idempotent set state running because the
count is always non-zero?

I think the first statement should be 'ret = 0;' instead to cause
idempotent state setting to succeed as a nop.

James