On Tue, Jun 14, 2022 at 09:48:24AM +0200, Christoph Hellwig wrote: > The elevator is only used for file system requests, which are stopped in > del_gendisk. Move disabling the elevator and freeing the scheduler tags > to the end of del_gendisk instead of doing that work in disk_release and > blk_cleanup_queue to avoid a use after free on q->tag_set from > disk_release as the tag_set might not be alive at that point. > > Move the blk_qos_exit call as well, as it just depends on the elevator > exit and would be the only reason to keep the not exactly cheap queue > freeze in disk_release. > > Fixes: e155b0c238b2 ("blk-mq: Use shared tags for shared sbitmap support") > Reported-by: syzbot+3e3f419f4a7816471838@xxxxxxxxxxxxxxxxxxxxxxxxx > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > Tested-by: syzbot+3e3f419f4a7816471838@xxxxxxxxxxxxxxxxxxxxxxxxx > --- > block/blk-core.c | 13 ------------- > block/genhd.c | 39 +++++++++++---------------------------- > 2 files changed, 11 insertions(+), 41 deletions(-) > > diff --git a/block/blk-core.c b/block/blk-core.c > index 06ff5bbfe8f66..27fb1357ad4b8 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -322,19 +322,6 @@ void blk_cleanup_queue(struct request_queue *q) > blk_mq_exit_queue(q); > } > > - /* > - * In theory, request pool of sched_tags belongs to request queue. > - * However, the current implementation requires tag_set for freeing > - * requests, so free the pool now. > - * > - * Queue has become frozen, there can't be any in-queue requests, so > - * it is safe to free requests now. > - */ > - mutex_lock(&q->sysfs_lock); > - if (q->elevator) > - blk_mq_sched_free_rqs(q); > - mutex_unlock(&q->sysfs_lock); > - > /* @q is and will stay empty, shutdown and put */ > blk_put_queue(q); > } > diff --git a/block/genhd.c b/block/genhd.c > index 27205ae47d593..e0675772178b0 100644 > --- a/block/genhd.c > +++ b/block/genhd.c > @@ -652,6 +652,17 @@ void del_gendisk(struct gendisk *disk) > > blk_sync_queue(q); > blk_flush_integrity(); > + blk_mq_cancel_work_sync(q); > + > + blk_mq_quiesce_queue(q); quiesce queue adds a bit long delay in del_gendisk, not sure if this way may cause regression in big machines with lots of disks. > + if (q->elevator) { > + mutex_lock(&q->sysfs_lock); > + elevator_exit(q); > + mutex_unlock(&q->sysfs_lock); > + } > + rq_qos_exit(q); > + blk_mq_unquiesce_queue(q); Also tearing down elevator here has to be carefully, that means any elevator reference has to hold rcu read lock or .q_usage_counter, meantime it has to be checked, otherwise use-after-free may be caused. Unfortunately, there are some cases which looks not safe, such as, __blk_mq_update_nr_hw_queues() and blk_mq_has_sqsched(). Another example is bfq_insert_request()<-bfq_insert_requests(): static void bfq_insert_request(struct blk_mq_hw_ctx *hctx, struct request *rq, bool at_head) { ... spin_unlock_irq(&bfqd->lock); bfq_update_insert_stats(q, bfqq, idle_timer_disabled, cmd_flags); } If last 'rq' is done between unlocking bfqd->lock and calling bfq_update_insert_stats, del_gendisk() may tear down elevator, and UAF is caused. Thanks, Ming