Re: [PATCH 1/1] scsi: fix hang when device state is set via sysfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/5/21 9:45 PM, Mike Christie wrote:
> Cc'ing lee.
> 
> On 10/5/21 11:31 PM, Mike Christie wrote:
>> This fixes a regression added with:
>>
>> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>>
>> The problem is that after iSCSI recovery, iscsid will call into the kernel
>> to set the dev's state to running, and with that patch we now call
>> scsi_rescan_device with the state_mutex held. If the scsi error handler
>> thread is just starting to test the device in scsi_send_eh_cmnd then it's
>> going to try to grab the state_mutex.
>>
>> We are then stuck, because when scsi_rescan_device tries to send its IO
>> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery
>> will return true (the host state is still in recovery) and IO will just be
>> requeued. scsi_send_eh_cmnd will then never be able to grab the
>> state_mutex to finish error handling.
>>
>> This just moves the scsi_rescan_device call to after we drop the
>> state_mutex.
> 
> 
> I want to maybe nak my own patch. There is still a problem where if one
> of the rescan IOs hits an issue then userspace is stuck waiting for
> however long it takes to perform recovery. For iscsid, this will cause
> problems because it sets the device state from its main thread. So
> while scsi_rescan_device is hung then iscsid can't do anything for
> any session.
> 
> I think we either want to:
> 
> 1. Do the patch below, but Lee will need to change iscsid so it sets
> the dev state from a worker thread.
> 
> 2. Have the kernel kick off the rescan from a workqueue. This seems
> easiest but I'm not sure if it will cause issues for lijinlin's use
> case.

I vote for #2, if possible.

> 
> 
>>
>> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after
>> offlinining device")
>> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx>
>> ---
>>  drivers/scsi/scsi_sysfs.c | 6 +++++-
>>  1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index 86793259e541..5b63407c3a3f 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	int i, ret;
>>  	struct scsi_device *sdev = to_scsi_device(dev);
>>  	enum scsi_device_state state = 0;
>> +	bool rescan_dev = false;
>>  
>>  	for (i = 0; i < ARRAY_SIZE(sdev_states); i++) {
>>  		const int len = strlen(sdev_states[i].name);
>> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr,
>>  	 */
>>  	if (ret == 0 && state == SDEV_RUNNING) {
>>  		blk_mq_run_hw_queues(sdev->request_queue, true);
>> -		scsi_rescan_device(dev);
>> +		rescan_dev = true;
>>  	}
>>  	mutex_unlock(&sdev->state_mutex);
>>  
>> +	if (rescan_dev)
>> +		scsi_rescan_device(dev);
>> +
>>  	return ret == 0 ? count : -EINVAL;
>>  }
>>  
>>
> 




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux