On 2024/6/13 14:27, Hannes Reinecke wrote:
On 6/12/24 17:06, Wenchao Hao wrote:
On 6/12/24 4:33 PM, Hannes Reinecke wrote:
On 6/5/24 11:17, Wenchao Hao wrote:
shost_for_each_device() would skip devices which is in SDEV_CANCEL or
SDEV_DEL state, for some scenarios, we donot want to skip these
devices,
so add a new macro shost_for_each_device_include_deleted() to handle
it.
Following changes are introduced:
1. Rework scsi_device_get(), add new helper __scsi_device_get() which
determine if skip deleted scsi_device by parameter "skip_deleted".
2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which
is used when calling __scsi_device_get()
3. Update shost_for_each_device() to call __scsi_iterate_devices() with
"skip_deleted" true
4. Add new macro shost_for_each_device_include_deleted() which call
__scsi_iterate_devices() with "skip_deleted" false
Signed-off-by: Wenchao Hao <haowenchao22@xxxxxxxxx>
---
drivers/scsi/scsi.c | 46
++++++++++++++++++++++++++------------
include/scsi/scsi_device.h | 25 ++++++++++++++++++---
2 files changed, 54 insertions(+), 17 deletions(-)
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 3e0c0381277a..5913de543d93 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev,
bool enable)
return 0;
}
-/**
- * scsi_device_get - get an additional reference to a scsi_device
+/*
+ * __scsi_device_get - get an additional reference to a scsi_device
* @sdev: device to get a reference to
- *
- * Description: Gets a reference to the scsi_device and increments
the use count
- * of the underlying LLDD module. You must hold host_lock of the
- * parent Scsi_Host or already have a reference when calling this.
- *
- * This will fail if a device is deleted or cancelled, or when the
LLD module
- * is in the process of being unloaded.
+ * @skip_deleted: when true, would return failed if device is deleted
*/
-int scsi_device_get(struct scsi_device *sdev)
+static int __scsi_device_get(struct scsi_device *sdev, bool
skip_deleted)
{
- if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state ==
SDEV_CANCEL)
+ /*
+ * if skip_deleted is true and device is in removing, return
failed
+ */
+ if (skip_deleted &&
+ (sdev->sdev_state == SDEV_DEL || sdev->sdev_state ==
SDEV_CANCEL))
goto fail;
Nack.
SDEV_DEL means the device is about to be deleted, so we _must not_
access it at all.
Sorry I added SDEV_DEL here at hand without understanding what it means.
Actually, just include scsi_device which is in SDEV_CANCEL would fix the
issues I described.
The issues are because device removing concurrent with error handle.
Normally, error handle would not be triggered when scsi_device is in
SDEV_DEL. Below is my analysis, if it is wrong, please correct me.
If there are scsi_cmnd remain unfinished when removing scsi_device,
the removing process would waiting for all commands to be finished.
If commands error happened and trigger error handle, the removing
process would be blocked until error handle finished, because
__scsi_remove_device called del_gendisk() which would wait all
requests to be finished. So now scsi_device is in SDEV_CANCEL.
If the scsi_device is already in SDEV_DEL, then no scsi_cmnd has been
dispatched to this scsi_device, then error handle would never triggered.
I want to change the new function __scsi_device_get() as following,
please help to review.
/*
* __scsi_device_get - get an additional reference to a scsi_device
* @sdev: device to get a reference to
* @skip_canceled: when true, would return failed if device is deleted
*/
static int __scsi_device_get(struct scsi_device *sdev, bool
skip_canceled)
{
/*
* if skip_canceled is true and device is in removing, return failed
*/
if (sdev->sdev_state == SDEV_DEL ||
(sdev->sdev_state == SDEV_CANCEL && skip_canceled))
goto fail;
if (!try_module_get(sdev->host->hostt->module))
goto fail;
if (!get_device(&sdev->sdev_gendev))
goto fail_put_module;
return 0;
fail_put_module:
module_put(sdev->host->hostt->module);
fail:
return -ENXIO;
}
I don't think that's required.
With your above analysis, wouldn't the problem be solved with:
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 775df00021e4..911fcfa80d69 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1470,6 +1470,8 @@ void __scsi_remove_device(struct scsi_device *sdev)
if (sdev->sdev_state == SDEV_DEL)
return;
+ scsi_block_when_processing_errors(sdev);
+
if (sdev->is_visible) {
/*
* If scsi_internal_target_block() is running
concurrently,
Hmm?
We can not make sure no scsi_cmnd remain unfinished after
scsi_block_when_processing_errors(). For example, there is a
command has been dispatched but it's not timeouted when removing
device, no error happened. After scsi_device is set to SDEV_CANCEL,
the removing process would be blocked by del_gendisk() because there
is still a request.
Then the request timeout and abort failed, error handle would be
triggered, now scsi_device is SDEV_CANCEL.
The error handle would skip this scsi_device when doing device reset.
Cheers,
Hannes