> On Apr 25, 2017, at 10:23 AM, Bart Van Assche <Bart.VanAssche@xxxxxxxxxxx> wrote: > > On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote: >> When a device is deleted through sysfs handle "delete", the code >> locks shost->scan_mutex. If multiple devices are deleted at the >> same time, these deletes will be handled in series. >> >> On the other hand, some devices do long latency IO during deletion, >> for example, sd_shutdown() may do sync cache and/or start_stop. >> It is not necessary for these commands to run in series. >> >> To reduce latency of parallel "delete" requests, this patch reduces >> the protection of scan_mutex. The only function with Scsi_Host >> called in __scsi_remove_device() is the optional slave_destroy(). >> Therefore, the protection of scan_mutex is only necessary for this >> function. > > Hello Song, > > What you wrote above is wrong. It is necessary to serialize SCSI > scanning against SCSI device removal. That is why scan_mutex is held > around the __scsi_remove_device() call. When adding a LUN the following > (and several other) actions are performed: > * Allocation of memory for struct scsi_device. > * Allocation of a block layer request queue. > * Initialization of the .sdev_gendev and .sdev_dev device structures. > * Association of the transport layer driver with the SCSI device. > * Association of a device handler with the SCSI device (e.g. ALUA). > * Making the .sdev_gendev and .sdev_dev devices visible in sysfs. > * Making the transport layer attributes visible in sysfs. > * Creating a bsg device node in sysfs. > * Association of an upper layer driver > (e.g. sd or sr) with the SCSI LUN. > > Removal of a LUN means undoing all of the above. If adding and removing > a LUN would not be not serialized then there would be a risk that removal > and immediate reregistration of a LUN will fail due to the reregistration > process trying to add sysfs attributes that were not yet removed by the > removal step. > > Bart. Hello Bart, Thanks a lot for looking into this. I have been studying the code recently. I am wondering whether the following would work: 1. Introduce a new mutex for scsi_device to protect most operations in the list you gathered above; 2. For operations like host->slave_destroy(), ensure they access scsi_host data with host_lock (or another spin lock). I looked into all instances of slave_destroy, only 2 of them: dc395x_slave_destroy() and visorhba_slave_destroy() access scsi_host data without protection of spin lock. 3. Once 1 and 2 is ready, __scsi_remove_device() only need to hold the mutex for the scsi_device. scan_mutex is no longer required. Is this a valid path? Thanks again, Song