Re: [PATCH 2/2] scsi: Fix hang when device state is set via sysfs

Mike Christie <michael.christie@xxxxxxxxxx> · Fri, 19 Nov 2021 11:13:11 -0600

On 11/19/21 7:35 AM, James Bottomley wrote:
> On Fri, 2021-11-05 at 17:10 -0500, Mike Christie wrote:
>> This fixes a regression added with:
>>
>> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>>
>> The problem is that after iSCSI recovery, iscsid will call into the
>> kernel
>> to set the dev's state to running, and with that patch we now call
>> scsi_rescan_device with the state_mutex held. If the scsi error
>> handler
>> thread is just starting to test the device in scsi_send_eh_cmnd then
>> it's
>> going to try to grab the state_mutex.
>>
>> We are then stuck, because when scsi_rescan_device tries to send its
>> IO
>> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
>> which will return true (the host state is still in recovery) and IO
>> will
>> just be requeued. scsi_send_eh_cmnd will then never be able to grab
>> the
>> state_mutex to finish error handling.
>>
>> To prevent the deadlock this moves the rescan related code to after
>> we
>> drop the state_mutex.
>>
>> This also adds a check for if we are already in the running state.
>> This
>> prevents extra scans and helps the iscsid case where if the transport
>> class has already onlined the device during it's recovery process
>> then we
>> don't need userspace to do it again plus possibly block that daemon.
>>
>> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>> Cc: Bart Van Assche <bvanassche@xxxxxxx>
>> Cc: lijinlin <lijinlin3@xxxxxxxxxx>
>> Cc: Wu Bo <wubo40@xxxxxxxxxx>
>> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
>> ---
>>  drivers/scsi/scsi_sysfs.c | 30 +++++++++++++++++++-----------
>>  1 file changed, 19 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index a35841b34bfd..53e23a7bc0d3 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -797,6 +797,7 @@ store_state_field(struct device *dev, struct
>> device_attribute *attr,
>>  	int i, ret;
>>  	struct scsi_device *sdev = to_scsi_device(dev);
>>  	enum scsi_device_state state = 0;
>> +	bool rescan_dev = false;
>>  
>>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>>  		const int len = strlen(sdev_states[i].name);
>> @@ -815,20 +816,27 @@ store_state_field(struct device *dev, struct
>> device_attribute *attr,
>>  	}
>>  
>>  	mutex_lock(&sdev->state_mutex);
>> -	ret = scsi_device_set_state(sdev, state);
>> -	/*
>> -	 * If the device state changes to SDEV_RUNNING, we need to
>> -	 * run the queue to avoid I/O hang, and rescan the device
>> -	 * to revalidate it. Running the queue first is necessary
>> -	 * because another thread may be waiting inside
>> -	 * blk_mq_freeze_queue_wait() and because that call may be
>> -	 * waiting for pending I/O to finish.
>> -	 */
>> -	if (ret == 0 && state == SDEV_RUNNING) {
>> +	if (sdev->sdev_state == SDEV_RUNNING && state == SDEV_RUNNING)
>> {
>> +		ret = count;
> 
> This looks wrong because of this
> 
> [...]
>>  	return ret == 0 ? count : -EINVAL;
> 
> Don't we now return EINVAL on idempotent set state running because the
> count is always non-zero?
> 
> I think the first statement should be 'ret = 0;' instead to cause
> idempotent state setting to succeed as a nop.
> 

You're right.

I'll resend this patchset with James's comment handled.