On 08/04/2012 04:01 AM, Bart Van Assche wrote: > On 08/02/12 08:41, Chanho Min wrote: >> This patch is to fix a oops from a torn down device. When >> scsi_run_queue process starved queues, scsi_request_fn can race with >> scsi_remove_device. In this case, rarely, scsi_request_fn release the >> last reference and set sdev->request_queue to NULL. It result in >> NULL-pointer dereference when spin_unlock is tried with (NULL)-> >> queue_lock. We need to add an extra reference to the device on both >> sides of the __blk_run_queue to hold reference until scsi_request_fn >> is finished. > > Good catch. So far I haven't been able to trigger this issue in my > tests. So it would be appreciated if you could verify whether the patch > below helps (patch is based on 3.6-rc1): > > --- > drivers/scsi/scsi_sysfs.c | 8 +++++++- > 1 files changed, 7 insertions(+), 1 deletions(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 093d4f6..59e523c 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -348,7 +348,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) > starget->reap_ref++; > list_del(&sdev->siblings); > list_del(&sdev->same_target_siblings); > - list_del(&sdev->starved_entry); > spin_unlock_irqrestore(sdev->host->host_lock, flags); > > cancel_work_sync(&sdev->event_work); > @@ -956,6 +955,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev) > void __scsi_remove_device(struct scsi_device *sdev) > { > struct device *dev = &sdev->sdev_gendev; > + struct Scsi_Host *shost = sdev->host; > + unsigned long flags; > > if (sdev->is_visible) { > if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) > @@ -977,6 +978,11 @@ void __scsi_remove_device(struct scsi_device *sdev) > blk_cleanup_queue(sdev->request_queue); > cancel_work_sync(&sdev->requeue_work); > > + spin_lock_irqsave(shost->host_lock, flags); > + if (!list_empty(&sdev->starved_entry)) > + list_del(&sdev->starved_entry); > + spin_unlock_irqrestore(shost->host_lock, flags); > + I do not think it's that simple. If scsi_run_queue is running right now and that function has deleted the starved entry and is now about to access the sdev or queue, then this code above does not help and __scsi_remove_device could just continue on and end up calling scsi_device_dev_release_usercontext and freeing the device from under scsi_run_queue. I think we have to have scsi-ml do a get_device when a sdev is added to the starved entry and then do a put_device when it is removed (must do these under the host lock for the starved entry case too). I am not sure if that is just a hack/papering-over of the problem and there are more issues like this. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html