On Sun, Mar 31, 2019 at 08:27:35AM -0700, Bart Van Assche wrote: > On 3/30/19 8:09 PM, Ming Lei wrote: > > Since 45a9c9d909b2 ("blk-mq: Fix a use-after-free"), run queue isn't > > allowed during cleanup queue even though queue refcount is held. > > > > This change has caused lots of kernel oops triggered in run queue path, > > turns out it isn't easy to fix them all. > > > > So move freeing of hw queue resources into queue's release handler, then > > the above issue is fixed. Meantime, this way is safe given freeing hw > > queue resource doesn't require to use tags. > > I'm not sure the approach of this patch series is really the direction we > should pursue. There are many block driver that free resources immediately Please see scsi_run_queue(), and the queue refcount is always held before run queue. That said this way has been working well for long time. However, 45a9c9d909b2 ("blk-mq: Fix a use-after-free") breaks this behaviour, and causes kernel oops. > after blk_cleanup_queue() returns. An example from the brd driver: > > static void brd_free(struct brd_device *brd) > { > put_disk(brd->brd_disk); > blk_cleanup_queue(brd->brd_queue); > brd_free_pages(brd); > kfree(brd); > } Once blk_cleanup_queue() is returned, especially queue is frozen, there isn't any request to be queued to driver, so it isn't a problem wrt. freeing driver resource because we won't call into driver any more under this situation. This patch fixes the race between cleanup queue and run queue: 1) run queue may happen before the queue is frozen, however blk_mq_run_hw_queues() may not be done when blk_freeze_queue() returns. 2) run queue may happen after blk_freeze_queue() returns > > I'd like to avoid having to modify all block drivers that free resources > immediately after blk_cleanup_queue() has returned. Have you considered to > modify blk_mq_run_hw_queues() such that it becomes safe to call that > function while blk_cleanup_queue() is in progress, e.g. by inserting a > percpu_ref_tryget_live(&q->q_usage_counter) / > percpu_ref_put(&q->q_usage_counter) pair? It can't work because blk_mq_run_hw_queues may happen after percpu_ref_exit() is done. However, if we move percpu_ref_exit() into queue's release handler, we don't need to grab q->q_usage_counter any more in blk_mq_run_hw_queues(), and we still have to free hw queue resources in queue's release handler, that is exactly what this patchset is doing. In short, getting q->q_usage_counter doesn't make a difference on this issue. Thanks, Ming