在 2022/02/23 22:30, Ming Lei 写道:
On Wed, Feb 23, 2022 at 07:26:01PM +0800, Yu Kuai wrote:
blk_mq_realloc_hw_ctxs() will free the 'queue_hw_ctx'(e.g. undate
submit_queues through configfs for null_blk), while it might still be
used from other context(e.g. switch elevator to none):
t1 t2
elevator_switch
blk_mq_unquiesce_queue
blk_mq_run_hw_queues
queue_for_each_hw_ctx
// assembly code for hctx = (q)->queue_hw_ctx[i]
mov 0x48(%rbp),%rdx -> read old queue_hw_ctx
__blk_mq_update_nr_hw_queues
blk_mq_realloc_hw_ctxs
hctxs = q->queue_hw_ctx
q->queue_hw_ctx = new_hctxs
kfree(hctxs)
movslq %ebx,%rax
mov (%rdx,%rax,8),%rdi ->uaf
Not only uaf on queue_hw_ctx, but also other similar issue on other
structures, and I think the correct and easy fix is to quiesce request
queue during updating nr_hw_queues, something like the following patch:
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a05ce7725031..d8e7c3cce0dd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4467,8 +4467,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues)
return;
- list_for_each_entry(q, &set->tag_list, tag_set_list)
+ list_for_each_entry(q, &set->tag_list, tag_set_list) {
blk_mq_freeze_queue(q);
+ blk_mq_quiesce_queue(q);
+ }
/*
* Switch IO scheduler to 'none', cleaning up the data associated
* with the previous scheduler. We will switch back once we are done
@@ -4518,8 +4520,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
list_for_each_entry(q, &set->tag_list, tag_set_list)
blk_mq_elv_switch_back(&head, q);
- list_for_each_entry(q, &set->tag_list, tag_set_list)
+ list_for_each_entry(q, &set->tag_list, tag_set_list) {
+ blk_mq_unquiesce_queue(q);
blk_mq_unfreeze_queue(q);
+ }
}
void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues)
Hi, Ming
If blk_mq_quiesce_queue() is called from __blk_mq_update_nr_hw_queues()
first, and then swithing elevator to none won't trigger the problem.
However, what if blk_mq_unquiesce_queue() from switching elevator
decrease quiesce_depth to 0 first, and then blk_mq_quiesce_queue() is
called from __blk_mq_update_nr_hw_queues(), it seems to me such
concurrent scenarios still exist.
Thanks,
Kuai
Thanks,
Ming
.