Re: [PATCH RFC] blk_mq: clear rq mapping in driver tags before freeing rqs in sched tags

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 18 Aug 2021 10:45:20 +0800

On Wed, Aug 18, 2021 at 10:02:09AM +0800, yukuai (C) wrote:
> On 2021/08/18 8:52, Ming Lei wrote:
> > On Tue, Aug 17, 2021 at 10:23:06AM +0800, Yu Kuai wrote:
> > > If ioscheduler is not none, hctx->tags->rq[tag] will point to
> > > hctx->sched_tags->static_rq[internel_tag] in blk_mq_get_driver_tag().
> > > However, static_rq of sched_tags might be freed through switching
> > > elevator or increasing nr_requests. Thus leave a window for some drivers
> > > to get the freed request through blk_mq_tag_to_rq(tags, tag).
> > 
> > I believe I have explained that it is bug of driver which has knowledge
> > if the passed tag is valid or not. We are clear that driver need to cover
> > race between normal completion and timeout/error handling.
> > 
> > > 
> > > It's difficult to fix this uaf from driver side, I'm thinking about
> > 
> > So far not see any analysis on why the uaf is triggered, care to
> > investigate the reason?
> 
> Hi, Ming
> 
> I'm sorry if I didn't explian the uaf clearly.
> 
> 1) At first, a normal io is submitted and completed with scheduler:
> 
> internel_tag = blk_mq_get_tag -> get tag from sched_tags
>  blk_mq_rq_ctx_init
>   sched_tags->rq[internel_tag] = sched_tag->static_rq[internel_tag]
> ...
> blk_mq_get_driver_tag
>  __blk_mq_get_driver_tag -> get tag from tags
>  tags->rq[tag] = sched_tag->static_rq[internel_tag]
> 
> So, both tags->rq[tag] and sched_tags->rq[internel_tag] are pointing
> to the request: sched_tags->static_rq[internal_tag].
> 
> 2) Then, if the sched_tags->static_rq is freed:
> 
> blk_mq_sched_free_requests
>  blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i)
>   blk_mq_clear_rq_mapping(set, tags, hctx_idx);
>    -> sched_tags->rq[internel_tag] is set to null here

Please take a look at blk_mq_clear_rq_mapping():

	//drv_tags points to set->tags[] which is shared in host wide
	struct blk_mq_tags *drv_tags = set->tags[hctx_idx];
	...

	//tags points to sched_tags
	list_for_each_entry(page, &tags->page_list, lru) {
		unsigned long start = (unsigned long)page_address(page);
		unsigned long end = start + order_to_size(page->private);
		int i;

		/* clear drv_tags->rq[i] in case it is from this sched tags*/
		for (i = 0; i < set->queue_depth; i++) {
			struct request *rq = drv_tags->rqs[i];
			unsigned long rq_addr = (unsigned long)rq;

			if (rq_addr >= start && rq_addr < end) {
				WARN_ON_ONCE(refcount_read(&rq->ref) != 0);
				cmpxchg(&drv_tags->rqs[i], rq, NULL);
			}
		}
	}

So we do clear tags->rq[] instead of sched_tag->rq[].

> 
> After switching elevator, tags->rq[tag] still point to the request
> that is just freed.

No.

> 
> 3) nbd server send a reply with random tag directly:
> 
> recv_work
>  nbd_read_stat
>   blk_mq_tag_to_rq(tags, tag)
>    rq = tags->rq[tag] -> rq is freed
> 
> Usually, nbd will get tag and send a request to server first, and then
> handle the reply. However, if the request is skipped, such uaf problem
> can be triggered.

When or how is such reply with random tag replied to nbd client? Is it
possible for nbd client to detect such un-expected/bad situation?
What if blk_mq_tag_to_rq() is just called before/when we clear tags->rq[]?

> 
> > 
> > The request reference has been cleared too in blk_mq_tag_update_depth():
> > 
> > 	blk_mq_tag_update_depth
> > 		blk_mq_free_rqs
> > 			blk_mq_clear_rq_mapping
> > 
> 
> What I'm trying to do is to clear rq mapping in both
> hctx->sched_tags->rq and hctx->tags->rq when sched_tags->static_rq
> is freed. However, I forgot about the case when tags is shared in
> multiple device. Thus what this patch does is clearly wrong...
> 
> So, what do you think about adding a new interface to iterate the
> request in tags->rq[], find what is pointing to the
> sched_tags->static_rq[], and use cmpxchg() to clear them?

See above, we already clear tags->rq[] if the rq is from to-be-freed
sched_tags.

Thanks,
Ming