Re: [PATCH 12/12] scsi_transport_sas: fix delete vs scan race

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 5, 2012 at 2:52 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> On Sun, Apr 22, 2012 at 10:15 AM, James Bottomley
> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> Async scan here means any scan in a different thread, right ... it just
>> has to be asynchronous relative to us?  So that includes the manually
>> initiated ones and hotplug ones, doesn't it?
>
> [ resend since I notice this never hit the lists ]
>
> Hmm, well no I don't think so.  This literally means the initial async
> scan, and the
> failure window is between when we skip the call to
> scsi_sysfs_add_sdev() (in scsi_add_lun() under the scan_mutex) and
> finally call scsi_sysfs_add_sdev() again via scsi_finish_async_scan().
> I don't see how that fixes it because when we fail the sequence goes:
>
> mutex_lock(scan_mutex)
> starget->parent = end_device;
> scsi_add_lun()
> mutex_unlock(scan_mutex)
>
> device_del(end_device)
>
> mutex_lock(scan_mutex)
> device_add(starget)
> <crash>
>
> As far as I can see taking the scan_mutex in sas_rphy_remove() does
> not change this failure window.  Unless I missed something?
>
> I am going to re-submit this patch as is with the proposed libsas batch for 3.5.

It turns out this patch can cause a deadlock in the scenario where we
have two hosts scanning and the "previous" host (according to the
async scan queue), experiences a device removal event.  I think the
following should be all we need:

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 01b0374..8906557 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1714,6 +1714,9 @@ static void scsi_sysfs_add_devices(struct
Scsi_Host *shost)
 {
        struct scsi_device *sdev;
        shost_for_each_device(sdev, shost) {
+               /* target removed before the device could be added */
+               if (sdev->sdev_state == SDEV_DEL)
+                       continue;
                if (!scsi_host_scan_allowed(shost) ||
                    scsi_sysfs_add_sdev(sdev) != 0)
                        __scsi_remove_device(sdev);

...since starget removal will mark the sdevs as deleted under
scan_mutex.  scsi_sysfs_add_devices can simply ignore deleted devices.
 I'll post this patch after Darek has a chance to try it out.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux