Re: [PATCH v4 2/4] scsi: core: Make sure that hosts outlive targets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/12/22 5:19 PM, Bart Van Assche wrote:
> From: Ming Lei <ming.lei@xxxxxxxxxx>
> 
> Fix the race conditions between SCSI LLD kernel module unloading and SCSI
> device and target removal by making sure that SCSI hosts are destroyed after
> all associated target and device objects have been freed.
> 
> Cc: Christoph Hellwig <hch@xxxxxx>
> Cc: Ming Lei <ming.lei@xxxxxxxxxx>
> Cc: Mike Christie <michael.christie@xxxxxxxxxx>
> Cc: Hannes Reinecke <hare@xxxxxxx>
> Cc: John Garry <john.garry@xxxxxxxxxx>
> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx>
> [ bvanassche: Reworked Ming's patch and split it ]
> ---
>  drivers/scsi/hosts.c     | 8 ++++++++
>  drivers/scsi/scsi_scan.c | 7 +++++++
>  include/scsi/scsi_host.h | 3 +++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
> index ef6c0e37acce..8fa98c8d0ee0 100644
> --- a/drivers/scsi/hosts.c
> +++ b/drivers/scsi/hosts.c
> @@ -190,6 +190,13 @@ void scsi_remove_host(struct Scsi_Host *shost)
>  	transport_unregister_device(&shost->shost_gendev);
>  	device_unregister(&shost->shost_dev);
>  	device_del(&shost->shost_gendev);
> +
> +	/*
> +	 * After scsi_remove_host() has returned the scsi LLD module can be
> +	 * unloaded and/or the host resources can be released. Hence wait until
> +	 * the dependent SCSI targets and devices are gone before returning.
> +	 */
> +	wait_event(shost->targets_wq, atomic_read(&shost->target_count) == 0);
>  }

If we only wait here we can still hit the race I described right?

Is the issue where we might be misunderstanding each other that the target
removal is slightly different from the host removal? For host removal we call
scsi_forget_host with the scan_mutex already held. So when scsi_forget_host
loops over the devices we know that there is no thread doing:

sdev_store_delete -> scsi_remove_device -> __scsi_remove_device -> blk_cleanup_queue

Since the sdev_store_delete call to scsi_remove_device call also grabs the scan_mutex,
we can't call scsi_forget_host until sdev_store_delete -> scsi_remove_device has returned.

For target removal,__scsi_remove_target doesn't take the scan_mutex when checking the
device state. So, we have this race:

1. syfs deletion runs  sdev_store_delete -> scsi_remove_device and
that takes the scan_mutex.

It then sets the state to SDEV_DEL.

2. fc/iscsi thread does __scsi_remove_target and it sees the device is in
the SDEV_DEL state. It skips the device and then we return from
__scsi_remove_target without having waited on that device's removal like is done
in other cases.

If the only issue we are concerned with is blk_cleanup_queue completing when we
remove the host or target, then for the target race above I think we can just use
the scan_mutex in __scsi_remove_target (that function then would use __scsi_remove_device).

If the issue is that there would be other threads holding a ref to the scsi_device
and they can call into the driver. and we want to make sure those refs are dropped
when scsi_remove_target returns then we need to do what I described in the other thread.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux