On 10/5/21 9:45 PM, Mike Christie wrote: > Cc'ing lee. > > On 10/5/21 11:31 PM, Mike Christie wrote: >> This fixes a regression added with: >> >> commit f0f82e2476f6 ("scsi: core: Fix capacity set to zero after >> offlinining device") >> >> The problem is that after iSCSI recovery, iscsid will call into the kernel >> to set the dev's state to running, and with that patch we now call >> scsi_rescan_device with the state_mutex held. If the scsi error handler >> thread is just starting to test the device in scsi_send_eh_cmnd then it's >> going to try to grab the state_mutex. >> >> We are then stuck, because when scsi_rescan_device tries to send its IO >> scsi_queue_rq calls -> scsi_host_queue_ready -> scsi_host_in_recovery >> will return true (the host state is still in recovery) and IO will just be >> requeued. scsi_send_eh_cmnd will then never be able to grab the >> state_mutex to finish error handling. >> >> This just moves the scsi_rescan_device call to after we drop the >> state_mutex. > > > I want to maybe nak my own patch. There is still a problem where if one > of the rescan IOs hits an issue then userspace is stuck waiting for > however long it takes to perform recovery. For iscsid, this will cause > problems because it sets the device state from its main thread. So > while scsi_rescan_device is hung then iscsid can't do anything for > any session. > > I think we either want to: > > 1. Do the patch below, but Lee will need to change iscsid so it sets > the dev state from a worker thread. > > 2. Have the kernel kick off the rescan from a workqueue. This seems > easiest but I'm not sure if it will cause issues for lijinlin's use > case. I vote for #2, if possible. > > >> >> Fixes: f0f82e2476f6 ("scsi: core: Fix capacity set to zero after >> offlinining device") >> Signed-off-by: Mike Christie <michael.christie@xxxxxxxxxx> >> --- >> drivers/scsi/scsi_sysfs.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c >> index 86793259e541..5b63407c3a3f 100644 >> --- a/drivers/scsi/scsi_sysfs.c >> +++ b/drivers/scsi/scsi_sysfs.c >> @@ -788,6 +788,7 @@ store_state_field(struct device *dev, struct device_attribute *attr, >> int i, ret; >> struct scsi_device *sdev = to_scsi_device(dev); >> enum scsi_device_state state = 0; >> + bool rescan_dev = false; >> >> for (i = 0; i < ARRAY_SIZE(sdev_states); i++) { >> const int len = strlen(sdev_states[i].name); >> @@ -817,10 +818,13 @@ store_state_field(struct device *dev, struct device_attribute *attr, >> */ >> if (ret == 0 && state == SDEV_RUNNING) { >> blk_mq_run_hw_queues(sdev->request_queue, true); >> - scsi_rescan_device(dev); >> + rescan_dev = true; >> } >> mutex_unlock(&sdev->state_mutex); >> >> + if (rescan_dev) >> + scsi_rescan_device(dev); >> + >> return ret == 0 ? count : -EINVAL; >> } >> >> >