On Mon, 2013-06-24 at 12:24 -0500, Mike Christie wrote: > On 06/24/2013 10:38 AM, James Bottomley wrote: > > On Wed, 2013-06-12 at 14:49 +0200, Bart Van Assche wrote: > >> scsi_run_queue() examines all SCSI devices that are present on > >> the starved list. Since scsi_run_queue() unlocks the SCSI host > >> lock before running a queue a SCSI device can get removed after > >> it has been removed from the starved list and before its queue > >> is run. Protect against that race condition by increasing the > >> sdev refcount before running the sdev queue. Also, remove a > >> SCSI device from the starved list before __scsi_remove_device() > >> decreases the sdev refcount such that the get_device() call > >> added in scsi_run_queue() is guaranteed to succeed. > >> > >> Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > >> Reported-and-tested-by: Chanho Min <chanho.min@xxxxxxx> > >> Reference: http://lkml.org/lkml/2012/8/2/96 > >> Acked-by: Tejun Heo <tj@xxxxxxxxxx> > >> Reviewed-by: Mike Christie <michaelc@xxxxxxxxxxx> > >> Cc: Hannes Reinecke <hare@xxxxxxx> > >> Cc: <stable@xxxxxxxxxxxxxxx> > >> --- > >> drivers/scsi/scsi_lib.c | 16 +++++++++++----- > >> drivers/scsi/scsi_sysfs.c | 14 +++++++++++++- > >> 2 files changed, 24 insertions(+), 6 deletions(-) > >> > >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > >> index 86d5220..d6ca072 100644 > >> --- a/drivers/scsi/scsi_lib.c > >> +++ b/drivers/scsi/scsi_lib.c > >> @@ -456,11 +456,17 @@ static void scsi_run_queue(struct request_queue *q) > >> continue; > >> } > >> > >> - spin_unlock(shost->host_lock); > >> - spin_lock(sdev->request_queue->queue_lock); > >> - __blk_run_queue(sdev->request_queue); > >> - spin_unlock(sdev->request_queue->queue_lock); > >> - spin_lock(shost->host_lock); > >> + /* > >> + * Obtain a reference before unlocking the host_lock such that > >> + * the sdev can't disappear before blk_run_queue() is invoked. > >> + */ > >> + get_device(&sdev->sdev_gendev); > >> + spin_unlock_irqrestore(shost->host_lock, flags); > >> + > >> + blk_run_queue(sdev->request_queue); > >> + put_device(&sdev->sdev_gendev); > >> + > >> + spin_lock_irqsave(shost->host_lock, flags); > >> } > >> /* put any unprocessed entries back */ > >> list_splice(&starved_list, &shost->starved_list); > >> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > >> index 931a7d9..34f1b39 100644 > >> --- a/drivers/scsi/scsi_sysfs.c > >> +++ b/drivers/scsi/scsi_sysfs.c > >> @@ -345,7 +345,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) > >> starget->reap_ref++; > >> list_del(&sdev->siblings); > >> list_del(&sdev->same_target_siblings); > >> - list_del(&sdev->starved_entry); > >> spin_unlock_irqrestore(sdev->host->host_lock, flags); > >> > >> cancel_work_sync(&sdev->event_work); > >> @@ -953,6 +952,8 @@ int scsi_sysfs_add_sdev(struct scsi_device *sdev) > >> void __scsi_remove_device(struct scsi_device *sdev) > >> { > >> struct device *dev = &sdev->sdev_gendev; > >> + struct Scsi_Host *shost = sdev->host; > >> + unsigned long flags; > >> > >> if (sdev->is_visible) { > >> if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) > >> @@ -974,6 +975,17 @@ void __scsi_remove_device(struct scsi_device *sdev) > >> blk_cleanup_queue(sdev->request_queue); > >> cancel_work_sync(&sdev->requeue_work); > >> > >> + /* > >> + * Remove a SCSI device from the starved list after blk_cleanup_queue() > >> + * finished such that scsi_request_fn() can't add it back to that list. > >> + * Also remove an sdev from the starved list before invoking > >> + * put_device() such that get_device() is guaranteed to succeed for an > >> + * sdev present on the starved list. > >> + */ > >> + spin_lock_irqsave(shost->host_lock, flags); > >> + list_del(&sdev->starved_entry); > >> + spin_unlock_irqrestore(shost->host_lock, flags); > >> + > >> if (sdev->host->hostt->slave_destroy) > >> sdev->host->hostt->slave_destroy(sdev); > >> transport_destroy_device(dev); > > > > I really don't like this because it's shuffling potentially fragile > > lifetime rules since you now have to have the sdev deleted from the > > starved list before final put. That becomes an unstated assumption > > within the code. > > > > The theory is that the starved list processing may be racing with a > > scsi_remove_device, so when we unlock the host lock, the device (and the > > queue) may be destroyed. OK, so I agree with this, remote a possibility > > though it may be. The easy way of fixing it without making assumptions > > is this, isn't it? All it requires is that the queue be destroyed after > > the starved list entry is deleted in the sdev release code. > > > > James > > > > --- > > > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > > index 86d5220..f294cc6 100644 > > --- a/drivers/scsi/scsi_lib.c > > +++ b/drivers/scsi/scsi_lib.c > > @@ -434,6 +434,7 @@ static void scsi_run_queue(struct request_queue *q) > > list_splice_init(&shost->starved_list, &starved_list); > > > > while (!list_empty(&starved_list)) { > > + struct request_queue *slq; > > /* > > * As long as shost is accepting commands and we have > > * starved queues, call blk_run_queue. scsi_request_fn > > @@ -456,10 +457,21 @@ static void scsi_run_queue(struct request_queue *q) > > continue; > > } > > > > + /* > > + * once we drop the host lock, a racing scsi_remove_device may > > + * remove the sdev from the starved list and destroy it and > > + * the queue. Mitigate by taking a reference to the queue and > > + * never touching the sdev again after we drop the host lock. > > + */ > > + slq = sdev->request_queue; > > + if (!blk_get_queue(slq)) > > + continue; > > + > > spin_unlock(shost->host_lock); > > - spin_lock(sdev->request_queue->queue_lock); > > - __blk_run_queue(sdev->request_queue); > > - spin_unlock(sdev->request_queue->queue_lock); > > + > > + blk_run_queue(slq); > > + blk_put_queue(slq); > > + > > spin_lock(shost->host_lock); > > } > > /* put any unprocessed entries back */ > > > > > > I think we will hit issues with the scsi_device being freed too soon still. > > > 1. In thread 1, __scsi_remove_device runs. It cleans up the commands and > it does the last put_device. It left the sdev on the starved list though. > > 2. In thread 2, scsi_run_queue runs and takes the dev off the starved > list and calls into block layer __blk_run_queue. > > 3. scsi_device_dev_release_usercontext runs and frees the scsi_device. > > 4. __blk_run_queue from #2 runs and calls into scsi_request_fn. We now > reference the freed sdev at the top of scsi_request_fn. This last can't happen because in 1. and before 3. the queue has transitioned to DEAD. That means you can't get a reference to the queue, but if you do, it won't run its request function. If it's already running its request function in 1. then blk_cleanup_queue() will wait for that to complete before transitioning the queue to DEAD. James -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html