On Wed, Nov 6, 2024 at 11:52 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2024/11/06 11:36, Ming Lei 写道: > > On Wed, Nov 06, 2024 at 11:25:07AM +0800, Yu Kuai wrote: > >> Hi, > >> > >> 在 2024/11/06 11:11, Ming Lei 写道: > >>> On Wed, Nov 06, 2024 at 10:58:40AM +0800, Yu Kuai wrote: > >>>> Hi,Ming and Christoph > >>>> > >>>> 在 2024/11/05 19:19, Christoph Hellwig 写道: > >>>>> On Mon, Nov 04, 2024 at 07:00:05PM +0800, Yu Kuai wrote: > >>>>>> From: Yu Kuai <yukuai3@xxxxxxxxxx> > >>>>>> > >>>>>> blk_mq_clear_flush_rq_mapping() is not called during scsi probe, by > >>>>>> checking blk_queue_init_done(). However, QUEUE_FLAG_INIT_DONE is cleared > >>>>>> in del_gendisk by commit aec89dc5d421 ("block: keep q_usage_counter in > >>>>>> atomic mode after del_gendisk"), hence for disk like scsi, following > >>>>>> blk_mq_destroy_queue() will not clear flush rq from tags->rqs[] as well, > >>>>>> cause following uaf that is found by our syzkaller for v6.6: > >>>>> > >>>>> Which means we leave the flush request lingering after del_gendisk, > >>>>> which sounds like the real bug. I suspect we just need to move the > >>>>> call to blk_mq_clear_flush_rq_mapping so that it is called from > >>>>> del_gendisk and doesn't leave the flush tag lingering around. > >>>>> > >>>> > >>>> This remind me that del_gendisk is still too late to do that. Noted that > >>>> flush_rq can acquire different tags, so if the multiple flush_rq is done > >>>> and those tags are not reused, the flush_rq can exist in multiple > >>>> entries in tags->rqs[]. The consequence I can think of is that iterating > >>>> tags can found the same flush_rq multiple times, and the flush_rq can be > >>>> inflight. > >>> > >>> How can that be one problem? > >>> > >>> Please look at > >>> > >>> commit 364b61818f65 ("blk-mq: clearing flush request reference in tags->rqs[]") > >>> commit bd63141d585b ("blk-mq: clear stale request in tags->rq[] before freeing one request pool") > >>> > >>> and understand the motivation. > >>> > >>> That also means it is just fine to delay blk_mq_clear_flush_rq_mapping() > >>> after disk is deleted. > >> > >> I do understand what you mean. Let's see if you want this to be avoided, > >> for example(no disk is deleted): > > > > It is definitely another issue, and not supposed to be covered by > > blk_mq_clear_flush_rq_mapping(). > > > >> > >> 1) issue a flush, and tag 0 is used, after the flush is done, > >> tags->rqs[0] = flush_rq > >> 2) issue another flush, and tag 1 is used, after the flush is done, > >> tags->rqs[1] = flush_rq > >> 3) issue a flush again, and tag 2 is used, and the flush_rq is > >> dispatched to disk; > > > > Yes, this kind of thing exists since blk-mq begins, and you can't expect > > blk_mq_in_flight_rw() to get accurate inflight requests. > > > >> 4) Then in this case, blk_mq_in_flight_rw() will found the same flush_rq > >> 3 times and think there are 3 inflight request, same for > >> hctx_busy_show()... > > > > But we have tried to avoid it, such as in blk_mq_find_and_get_req() > > req->tag is checked against the current 'bitnr' of sbitmap when walking > > over tags. Then the same flush_rq won't be counted 3 times. > > Yes, I missed that req->tag is checked. Looks like there is no porblem > now. :) > > Just to make sure, we're still on the same page for this patch, right? Yeah, my reviewed-by still stands, and it is fine to delay to clear flush req wrt. the original motivation, what matters is that it is cleared before the flush req is freed. Thanks,