2015-10-15 16:22 GMT+08:00 Zhengping Zhou <johnzzpcrystal@xxxxxxxxx>: > Yes > > But it's interesting ,when you continue to add this device (with the > same LUN, target, bus, and host number!) to the list,scsi_add_device > will call scsi_probe_and_add_lun,and it will call > scsi_device_lookup_by_target(starget, lun) to recheck its existence > finally.but scsi_device_lookup_by_target doesn't has the problem what > I mentioned in this patch .So it's OK when you add this device again. > > Forgive my poor English! > 2015-10-15 15:53 GMT+08:00 Hannes Reinecke <hare@xxxxxxx>: >> On 10/15/2015 09:38 AM, Zhengping Zhou wrote: >>> when a scsi_device is unpluged from scsi controller, if the >>> scsi_device is still be used by application layer,it won't be >>> released until users release it. In this case, scsi_device_remove just set >>> the scsi_device's state to be SDEV_DEL. But if you plug the disk >>> just before the old scsi_device is released, then there will be two >>> scsi_device structures in scsi_host->__devices. when the next unpluging >>> event happens,some low-level drivers will check whether the scsi_device >>> has been added to host (for example, the megaraid sas series controller) >>> by calling scsi_device_lookup(call __scsi_device_lookup). >>> __scsi_device_lookup will return the first scsi_device. Because its >>> state is SDEV_DEL, the scsi_device_lookup will return NULL finally, >>> making the low-level driver assume that the scsi_device has been >>> removed,and won't call scsi_device_remove,which will lead the >>> failure of hot swap. >>> Signed-off-by: Zhengping Zhou <johnzzpcrystal@xxxxxxxxx> >>> --- >>> Hi all: >>> I'm sorry to bother again,that's my second time to send >>> this patch. >>> I find a bug about the failure of hot swap when I am using >>> megaraid sas series controller. Finally I have found that >>> when controller receives the event of hot swap, it will firstly >>> check whether the device is added to the system/host by calling >>> scsi_device_lookup.The logics in function megasas_aen_polling >>> is as follows: >>> case MR_EVT_PD_REMOVED: >>> if (megasas_get_pd_list(instance) == 0) { >>> for (i = 0; i < MEGASAS_MAX_PD_CHANNELS; i++) { >>> for (j = 0; >>> j < MEGASAS_MAX_DEV_PER_CHANNEL; >>> j++) { >>> >>> pd_index = >>> (i * MEGASAS_MAX_DEV_PER_CHANNEL) + j; >>> >>> sdev1 = scsi_device_lookup(host, i, j, 0); >>> >>> if (instance->pd_list[pd_index].driveState >>> == MR_PD_STATE_SYSTEM) { >>> if (sdev1) >>> scsi_device_put(sdev1); >>> } else { >>> if (sdev1) { >>> scsi_remove_device(sdev1); >>> scsi_device_put(sdev1); >>> } >>> } >>> } >>> } >>> } >>> If the previous scsi_device is not released, this will lead the >>> appearance of two scsi_devices which correspond with the same disk. >>> And when the disk is unpluged afterwards, the controller will assume >>> that this disk has never been added into the system/host. Thus it won't >>> call scsi_device_remove. When I finish this modification, this problem >>> is fixed.So far, I have successfully test PCI_DEVICE_ID_LSI_SAS0073SKINNY >>> and PCI_DEVICE_ID_LSI_FURY. >>> Thanks >>> Zhengping >>> --- >>> drivers/scsi/scsi.c | 2 ++ >>> 1 file changed, 2 insertions(+) >>> >>> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c >>> index 207d6a7..5251d6d 100644 >>> --- a/drivers/scsi/scsi.c >>> +++ b/drivers/scsi/scsi.c >>> @@ -1118,6 +1118,8 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost, >>> struct scsi_device *sdev; >>> >>> list_for_each_entry(sdev, &shost->__devices, siblings) { >>> + if (sdev->sdev_state == SDEV_DEL) >>> + continue; >>> if (sdev->channel == channel && sdev->id == id && >>> sdev->lun ==lun) >>> return sdev; >>> >> Ho-hum. >> >> So lookup will return NULL, which then will cause the subsequent >> functions to assume the scsi_device is not present, right? >> >> And if you're _really_ unlucky it'll continue to add this device >> (with the same LUN, target, bus, and host number!) to the list, >> resulting in us having _two_ devices with the same number on the list. >> >> Happy lookup. >> >> I guess this calls for the lock rework from Johannes ... >> >> Cheers, >> >> Hannes >> -- >> Dr. Hannes Reinecke zSeries & Storage >> hare@xxxxxxx +49 911 74053 688 >> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg >> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton >> HRB 21284 (AG Nürnberg) You can compare the function __scsi_device_lookup_by_target and the function __scsi_device_lookup. It will be more easy to figure out this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html