On Sun, Apr 4, 2021 at 5:28 PM Bart Van Assche <bvanassche@xxxxxxx> wrote: > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > index 116c3691b104..e7a6a114d216 100644 > --- a/block/blk-mq-tag.c > +++ b/block/blk-mq-tag.c > @@ -209,7 +209,11 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data) > > if (!reserved) > bitnr += tags->nr_reserved_tags; > - rq = tags->rqs[bitnr]; > + /* > + * Protected by rq->q->q_usage_counter. See also > + * blk_mq_queue_tag_busy_iter(). > + */ > + rq = rcu_dereference_check(tags->rqs[bitnr], true); maybe I'm missing something, but if this tags struct has a shared sbitmap, what guarantees do we have that while iterating we won't touch requests from a queue that's tearing down? The check a few lines below suggests that at the least we may touch requests from a different queue. say we enter blk_mq_queue_tag_busy_iter, we're iterating with raised hctx->q->q_usage_counter, and get to bt_iter say tagset has 2 shared queues, hctx->q is q1, rq->q is q2 (thread 1) rq = rcu_deref_check (rq->q != hctx->q, but we don't know yet) (thread 2) elsewhere, blk_cleanup_queue(q2) runs (or elevator_exit), since we only have raised q_usage_counter on q1, this goes to completion and frees rq. if we have preempt kernel, thread 1 may be paused before we read rq->q, so synchonrize_rcu passes happily by (thread 1) we check rq && rq->q == hctx->q, use-after-free since rq was freed above I think John Garry mentioned observing a similar race in patch 2 of his series, perhaps his test case can verify this? "Indeed, blk_mq_queue_tag_busy_iter() already does take a reference to its queue usage counter when called, and the queue cannot be frozen to switch IO scheduler until all refs are dropped. This ensures no stale references to IO scheduler requests will be seen by blk_mq_queue_tag_busy_iter(). However, there is nothing to stop blk_mq_queue_tag_busy_iter() being run for another queue associated with the same tagset, and it seeing a stale IO scheduler request from the other queue after they are freed." so, to my understanding, we have a race between reading tags->rq[bitnr], and verifying that rq->q == hctx->q, where if we schedule off we might have use-after-free? If that's the case, perhaps we should rcu_read_lock until we verify the queues match, then we can release and run fn(), as we verified we no longer need it? > > /* > * We can hit rq == NULL here, because the tagging functions > @@ -254,11 +258,17 @@ struct bt_tags_iter_data { > unsigned int flags; > }; > Thanks, Khazhy
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature