On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), > we > removed scsi_device_get() and directly called get_device() to > increase > the refcount of the device. But actullay scsi_device_get() will fail > in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > - > >scsi_disk_put() > ->put_device() > last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 This analysis fails here: get_device() on something with refcount 0 returns NULL. That triggers the if clause to ignore this device. We may have a more complex way of triggering a dual put race as the trace implies, but I don't think this is it. [...] > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device > *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == > SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} This is pretty much scsi_device_get() without the try_module get, so they should probably be combined. James > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct > scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev);