On Wed, Aug 18, 2021 at 10:02:09AM +0800, yukuai (C) wrote: > On 2021/08/18 8:52, Ming Lei wrote: > > On Tue, Aug 17, 2021 at 10:23:06AM +0800, Yu Kuai wrote: > > > If ioscheduler is not none, hctx->tags->rq[tag] will point to > > > hctx->sched_tags->static_rq[internel_tag] in blk_mq_get_driver_tag(). > > > However, static_rq of sched_tags might be freed through switching > > > elevator or increasing nr_requests. Thus leave a window for some drivers > > > to get the freed request through blk_mq_tag_to_rq(tags, tag). > > > > I believe I have explained that it is bug of driver which has knowledge > > if the passed tag is valid or not. We are clear that driver need to cover > > race between normal completion and timeout/error handling. > > > > > > > > It's difficult to fix this uaf from driver side, I'm thinking about > > > > So far not see any analysis on why the uaf is triggered, care to > > investigate the reason? > > Hi, Ming > > I'm sorry if I didn't explian the uaf clearly. > > 1) At first, a normal io is submitted and completed with scheduler: > > internel_tag = blk_mq_get_tag -> get tag from sched_tags > blk_mq_rq_ctx_init > sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag] > ... > blk_mq_get_driver_tag > __blk_mq_get_driver_tag -> get tag from tags > tags->rq[tag] = sched_tag->static_rq[internel_tag] > > So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing > to the request: sched_tags->static_rq[internal_tag]. > > 2) Then, if the sched_tags->static_rq is freed: > > blk_mq_sched_free_requests > blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i) > blk_mq_clear_rq_mapping(set, tags, hctx_idx); > -> sched_tags->rq[internel_tag] is set to null here Please take a look at blk_mq_clear_rq_mapping(): //drv_tags points to set->tags[] which is shared in host wide struct blk_mq_tags *drv_tags = set->tags[hctx_idx]; ... //tags points to sched_tags list_for_each_entry(page, &tags->page_list, lru) { unsigned long start = (unsigned long)page_address(page); unsigned long end = start + order_to_size(page->private); int i; /* clear drv_tags->rq[i] in case it is from this sched tags*/ for (i = 0; i < set->queue_depth; i++) { struct request *rq = drv_tags->rqs[i]; unsigned long rq_addr = (unsigned long)rq; if (rq_addr >= start && rq_addr < end) { WARN_ON_ONCE(refcount_read(&rq->ref) != 0); cmpxchg(&drv_tags->rqs[i], rq, NULL); } } } So we do clear tags->rq[] instead of sched_tag->rq[]. > > After switching elevator, tags->rq[tag] still point to the request > that is just freed. No. > > 3) nbd server send a reply with random tag directly: > > recv_work > nbd_read_stat > blk_mq_tag_to_rq(tags, tag) > rq = tags->rq[tag] -> rq is freed > > Usually, nbd will get tag and send a request to server first, and then > handle the reply. However, if the request is skipped, such uaf problem > can be triggered. When or how is such reply with random tag replied to nbd client? Is it possible for nbd client to detect such un-expected/bad situation? What if blk_mq_tag_to_rq() is just called before/when we clear tags->rq[]? > > > > > The request reference has been cleared too in blk_mq_tag_update_depth(): > > > > blk_mq_tag_update_depth > > blk_mq_free_rqs > > blk_mq_clear_rq_mapping > > > > What I'm trying to do is to clear rq mapping in both > hctx->sched_tags->rq and hctx->tags->rq when sched_tags->static_rq > is freed. However, I forgot about the case when tags is shared in > multiple device. Thus what this patch does is clearly wrong... > > So, what do you think about adding a new interface to iterate the > request in tags->rq[], find what is pointing to the > sched_tags->static_rq[], and use cmpxchg() to clear them? See above, we already clear tags->rq[] if the rq is from to-be-freed sched_tags. Thanks, Ming