On Fri, 12 Mar 2010, James Bottomley wrote: > > You're right that a duplicate target wouldn't get created. Still, > > there really is a bug. Suppose two different threads call > > scsi_alloc_target() for the same target ID at about the same time. The > > first thread won't find an existing target on the list, so it will > > allocate and initialize a new target structure. However, it releases > > the host lock before calling the host template's target_alloc method. > > > > Meanwhile, the second thread will find the new target structure on the > > list and will therefore feel free to start using it. If the timing > > works out, it could start using the new target before the first thread > > calls target_alloc. > > So currently, that can't happen because of the scan mutex. True. Okay, that's not a bug. However I still have the feeling that the concurrency issues associated with targets haven't been thought out sufficiently. Here's another example. Suppose that two threads probe two different LUNs under the same target at the same time, and suppose that scsi_target_add() fails: Thread 0 Thread 1 -------- -------- scsi_alloc_target() creates a new target Lock scan_mutex Scan LUN1 scsi_alloc_target() finds the existing target and increments its reap_ref Blocks waiting for scan_mutex scsi_target_add() fails Destroy the sdev for LUN1 (the target isn't destroyed because thread 1 has incremented reap_ref) Unlock scan_mutex Scan LUN2 Call scsi_target_add() again The code ends up calling device_add() twice for the same target structure. Even though the first call fails, the driver core does not guarantee this will work properly. AFAIK, nobody has ever audited that code to see if there are any fields assumed to be 0 which may end up being nonzero after a failed registration. Also, notice that in its failure path, scsi_alloc_target() calls scsi_target_reap(). This doesn't seem right; it's unbalanced. Here's what can happen: __scsi_add_device() calls scsi_alloc_target(), creating a new target with reap_ref = 1. It then calls scsi_probe_and_add_lun(), which calls scsi_alloc_sdev(), increasing reap_ref to 2. Next we call scsi_add_lun(), but scsi_target_add() fails decreasing reap_ref back to 1. scsi_add_lun() returns SCSI_SCAN_NO_RESPONSE because of the failure, so scsi_probe_and_add_lun() goes on to call __scsi_remove_device(). This causes scsi_device_dev_release_usercontext() to call scsi_target_reap(), decreasing reap_ref to 0 and deallocating the target structure. Finally, __scsi_add_device() calls scsi_target_reap(), passing a pointer to a structure that no longer exists. The last two steps may occur in the opposite order because of the use of a workqueue, but the end result is the same: scsi_target_reap() gets called with a stale pointer. It certainly looks as though scsi_target_add() should not include the get_device(&starget->dev); scsi_target_reap(starget); put_device(&starget->dev); sequence. (Not to mention that the get_device and put_device calls are totally gratuitous anyway.) Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html