On 09/06/12 20:14, Mike Christie wrote: > On 09/06/2012 12:58 PM, Bart Van Assche wrote: >> _suOn 09/06/12 18:27, Michael Christie wrote: >>> On Sep 3, 2012, at 9:12 AM, Bart Van Assche <bvanassche@xxxxxxx> wrote: >>>> If the put_device() call in scsi_request_fn() drops the sdev refcount >>>> to zero then the spin_lock() call after the put_device() call triggers >>>> a use-after-free. Avoid that by making sure that blk_cleanup_queue() >>>> can only finish after all active scsi_request_fn() calls have returned. >>> >>> If we have this patch http://marc.info/?l=linux-scsi&m=134453905402413&w=2 >>> it seems we have all the scsi layer callers of the request_fn/ >>> *blk_run_queue holding a reference to the device when they make the call. >>> Right, or are there some other places missing? >>> >>> What are the other places we can call the request_fn without already >>> holding a reference to the device? Is it the block layer? Is that why we >>> need this patch? >> >> The purpose of this patch is indeed to make *blk_run_queue() calls from >> the block layer safe. There are several direct or indirect >> *blk_run_queue() calls in the block layer where a reference on the queue >> is held but not on the sdev, e.g. in the md, dm and bsg drivers. > > Is there a race still? If some blk code is calling blk_run_queue > (waiting on the queue lock) but no IO is queued, > blk_drain_queue/blk_cleanup_queue could complete since the drain > counters are zero. Then blk_run_queue could grab the queue lock and call > the request_fn on a freed scsi_device (sdev pointed to by q->queuedata > would be freed so scsi_reuqest_fn would be freed). > > Do we need a check in __blk_run_queue for QUEUE_FLAG_DEAD (if dead then > fail IO?), or do we need a check in scsi_request_fn for this. A dead > queue check or maybe null q->queuedata in > scsi_device_dev_release_usercontext (do this with the queue lock held), > then check for null q->queuedata in scsi_request_fn? Yet another scenario is that scsi_remove_host() gets invoked and finishes after scsi_request_fn() has unlocked the queue and before it locks the queue again. That's a scenario that can't be handled by adding more checks at the start of __blk_run_queue() or scsi_request_fn(). Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html