Hi, 在 2022/11/03 9:39, Khazhismel Kumykov 写道:
This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL, but woken_list_node still being hashed. This would happen when bfq_init_rq() expects a brand new allocated queue to be returned from
From what I see, bfqq->waker_bfqq is updated in bfq_init_rq() only if 'new_queue' is false, but if 'new_queue' is false, the returned 'bfqq' from bfq_get_bfqq_handle_split() will never be oom_bfqq, so I'm confused here...
bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq without resetting woken_list_node. Since we can always return oom_bfqq when attempting to allocate, we cannot assume waker_bfqq starts as NULL. We must either reset woken_list_node, or avoid setting woken_list at all for oom_bfqq - opt to do the former.
Once oom_bfqq is used, I think the io is treated as issued from root group. Hence I don't think it's necessary to set woken_list or waker_bfqq for oom_bfqq. Thanks, Kuai
Crashes would have a stacktrace like: [160595.656560] bfq_add_bfqq_busy+0x110/0x1ec [160595.661142] bfq_add_request+0x6bc/0x980 [160595.666602] bfq_insert_request+0x8ec/0x1240 [160595.671762] bfq_insert_requests+0x58/0x9c [160595.676420] blk_mq_sched_insert_request+0x11c/0x198 [160595.682107] blk_mq_submit_bio+0x270/0x62c [160595.686759] __submit_bio_noacct_mq+0xec/0x178 [160595.691926] submit_bio+0x120/0x184 [160595.695990] ext4_mpage_readpages+0x77c/0x7c8 [160595.701026] ext4_readpage+0x60/0xb0 [160595.705158] filemap_read_page+0x54/0x114 [160595.711961] filemap_fault+0x228/0x5f4 [160595.716272] do_read_fault+0xe0/0x1f0 [160595.720487] do_fault+0x40/0x1c8 Tested by injecting random failures into bfq_get_queue, crashes go away completely. Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers") Signed-off-by: Khazhismel Kumykov <khazhy@xxxxxxxxxx> --- RFC mainly because it's not clear to me the best policy here - but the patch is tested and fixes a real crash we started seeing in 5.15 This is following up my ramble over at https://lore.kernel.org/lkml/CACGdZYLMnfcqwbAXDx+x9vUOMn2cz55oc+8WySBS3J2Xd_q7Lg@xxxxxxxxxxxxxx/ block/bfq-iosched.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 7ea427817f7f..5d2861119d20 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -6793,7 +6793,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq) * reset. So insert new_bfqq into the * woken_list of the waker. See * bfq_check_waker for details. + * + * Also, if we got oom_bfqq, we must check if + * it's already in a woken_list */ + if (unlikely(!hlist_unhashed(&bfqq->woken_list_node))) + hlist_del_init(&bfqq->woken_list_node); if (bfqq->waker_bfqq) hlist_add_head(&bfqq->woken_list_node, &bfqq->waker_bfqq->woken_list);