On 10/5/21 11:45 PM, Mike Christie wrote: > Cc'ing lee. > > On 10/5/21 11:31 PM, Mike Christie wrote: >> This fixes a regression added with: >> >> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after >> offlinining device") >> >> The problem is that after iSCSI recovery, iscsid will call into the kernel >> to set the dev's state to running, and with that patch we now call >> scsi_rescan_device with the state_mutex held. If the scsi error handler >> thread is just starting to test the device in scsi_send_eh_cmnd then it's >> going to try to grab the state_mutex. >> >> We are then stuck, because when scsi_rescan_device tries to send its IO >> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery >> will return true (the host state is still in recovery) and IO will just be >> requeued. scsi_send_eh_cmnd will then never be able to grab the >> state_mutex to finish error handling. >> >> This just moves the scsi_rescan_device call to after we drop the >> state_mutex. > > > I want to maybe nak my own patch. There is still a problem where if one > of the rescan IOs hits an issue then userspace is stuck waiting for > however long it takes to perform recovery. For iscsid, this will cause > problems because it sets the device state from its main thread. So > while scsi_rescan_device is hung then iscsid can't do anything for > any session. > > I think we either want to: > > 1. Do the patch below, but Lee will need to change iscsid so it sets > the dev state from a worker thread. > > 2. Have the kernel kick off the rescan from a workqueue. This seems > easiest but I'm not sure if it will cause issues for lijinlin's use > case. I have not heard from huawei, but I don't think we can do 2. The problem is that I think userspace will not assume once the write returns that the device is ready to go. So we can't: 1. just kick off an async rescan, then return. There are issues with this assumption though. See below. 2. 2.A kick off rescan 2.B wait for rescan to complete or device/host to change state this could hang iscsid until the scsi cmd timer fires or until the transport timer fires. I think the options are: 1. Revert the patch. The problem is that the rescan can still fail, and if it didn't cause a deadlock due to the bug in this thread, the device could go back to offline, but we would still return success. Why didn't userspace just rescan from sysfs? 2. Do the patch below so we at least don't deadlock and fix iscsid. Maybe iscsid was a little too smart for its own good, and should not have assumed that writing to the state file could not block for long periods. 3. ? > > >> >> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after >> offlinining device") >> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> >> --- >> drivers/scsi/scsi_sysfs.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c >> index 86793259e541..5b63407c3a3f 100644 >> --- a/drivers/scsi/scsi_sysfs.c >> +++ b/drivers/scsi/scsi_sysfs.c >> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr, >> int i, ret; >> struct scsi_device *sdev = to_scsi_device(dev); >> enum scsi_device_state state = 0; >> + bool rescan_dev = false; >> >> for (i = 0; i < ARRAY_SIZE(sdev_states); i++) { >> const int len = strlen(sdev_states[i].name); >> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr, >> */ >> if (ret == 0 && state == SDEV_RUNNING) { >> blk_mq_run_hw_queues(sdev->request_queue, true); >> - scsi_rescan_device(dev); >> + rescan_dev = true; >> } >> mutex_unlock(&sdev->state_mutex); >> >> + if (rescan_dev) >> + scsi_rescan_device(dev); >> + >> return ret == 0 ? count : -EINVAL; >> } >> >> >