On 6/12/24 4:33 PM, Hannes Reinecke wrote: > On 6/5/24 11:17, Wenchao Hao wrote: >> shost_for_each_device() would skip devices which is in SDEV_CANCEL or >> SDEV_DEL state, for some scenarios, we donot want to skip these devices, >> so add a new macro shost_for_each_device_include_deleted() to handle it. >> >> Following changes are introduced: >> >> 1. Rework scsi_device_get(), add new helper __scsi_device_get() which >> determine if skip deleted scsi_device by parameter "skip_deleted". >> 2. Add new parameter "skip_deleted" to __scsi_iterate_devices() which >> is used when calling __scsi_device_get() >> 3. Update shost_for_each_device() to call __scsi_iterate_devices() with >> "skip_deleted" true >> 4. Add new macro shost_for_each_device_include_deleted() which call >> __scsi_iterate_devices() with "skip_deleted" false >> >> Signed-off-by: Wenchao Hao <haowenchao22@xxxxxxxxx> >> --- >> drivers/scsi/scsi.c | 46 ++++++++++++++++++++++++++------------ >> include/scsi/scsi_device.h | 25 ++++++++++++++++++--- >> 2 files changed, 54 insertions(+), 17 deletions(-) >> >> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c >> index 3e0c0381277a..5913de543d93 100644 >> --- a/drivers/scsi/scsi.c >> +++ b/drivers/scsi/scsi.c >> @@ -735,20 +735,18 @@ int scsi_cdl_enable(struct scsi_device *sdev, bool enable) >> return 0; >> } >> -/** >> - * scsi_device_get - get an additional reference to a scsi_device >> +/* >> + * __scsi_device_get - get an additional reference to a scsi_device >> * @sdev: device to get a reference to >> - * >> - * Description: Gets a reference to the scsi_device and increments the use count >> - * of the underlying LLDD module. You must hold host_lock of the >> - * parent Scsi_Host or already have a reference when calling this. >> - * >> - * This will fail if a device is deleted or cancelled, or when the LLD module >> - * is in the process of being unloaded. >> + * @skip_deleted: when true, would return failed if device is deleted >> */ >> -int scsi_device_get(struct scsi_device *sdev) >> +static int __scsi_device_get(struct scsi_device *sdev, bool skip_deleted) >> { >> - if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) >> + /* >> + * if skip_deleted is true and device is in removing, return failed >> + */ >> + if (skip_deleted && >> + (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)) >> goto fail; > > Nack. > SDEV_DEL means the device is about to be deleted, so we _must not_ access it at all. > Sorry I added SDEV_DEL here at hand without understanding what it means. Actually, just include scsi_device which is in SDEV_CANCEL would fix the issues I described. The issues are because device removing concurrent with error handle. Normally, error handle would not be triggered when scsi_device is in SDEV_DEL. Below is my analysis, if it is wrong, please correct me. If there are scsi_cmnd remain unfinished when removing scsi_device, the removing process would waiting for all commands to be finished. If commands error happened and trigger error handle, the removing process would be blocked until error handle finished, because __scsi_remove_device called del_gendisk() which would wait all requests to be finished. So now scsi_device is in SDEV_CANCEL. If the scsi_device is already in SDEV_DEL, then no scsi_cmnd has been dispatched to this scsi_device, then error handle would never triggered. I want to change the new function __scsi_device_get() as following, please help to review. /* * __scsi_device_get - get an additional reference to a scsi_device * @sdev: device to get a reference to * @skip_canceled: when true, would return failed if device is deleted */ static int __scsi_device_get(struct scsi_device *sdev, bool skip_canceled) { /* * if skip_canceled is true and device is in removing, return failed */ if (sdev->sdev_state == SDEV_DEL || (sdev->sdev_state == SDEV_CANCEL && skip_canceled)) goto fail; if (!try_module_get(sdev->host->hostt->module)) goto fail; if (!get_device(&sdev->sdev_gendev)) goto fail_put_module; return 0; fail_put_module: module_put(sdev->host->hostt->module); fail: return -ENXIO; } > Cheers, > > Hannes