On Thu, Aug 12, 2021 at 07:47:55PM +0800, Xiaoguang Wang wrote: > hi, > > > > For fixing use-after-free during iterating over requests, we grabbed > > request's refcount before calling ->fn in commit 2e315dc07df0 ("blk-mq: > > grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter"). > > Turns out this way may cause kernel panic when iterating over one flush > > request: > > > > 1) old flush request's tag is just released, and this tag is reused by > > one new request, but ->rqs[] isn't updated yet > > > > 2) the flush request can be re-used for submitting one new flush command, > > so blk_rq_init() is called at the same time > > > > 3) meantime blk_mq_queue_tag_busy_iter() is called, and old flush request > > is retrieved from ->rqs[tag]; when blk_mq_put_rq_ref() is called, > > flush_rq->end_io may not be updated yet, so NULL pointer dereference > > is triggered in blk_mq_put_rq_ref(). > > > > Fix the issue by calling refcount_set(&flush_rq->ref, 1) after > > flush_rq->end_io is set. So far the only other caller of blk_rq_init() is > > scsi_ioctl_reset() in which the request doesn't enter block IO stack and > > the request reference count isn't used, so the change is safe. > > > > Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in > > blk_mq_tagset_busy_iter") > > Reported-by: "Blank-Burian, Markus, Dr." <blankburian@xxxxxxxxxxxxxxx> > > Tested-by: "Blank-Burian, Markus, Dr." <blankburian@xxxxxxxxxxxxxxx> > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > --- > > block/blk-core.c | 1 - > > block/blk-flush.c | 8 ++++++++ > > 2 files changed, 8 insertions(+), 1 deletion(-) > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > index 0874bc2fcdb4..b5098739f72a 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -121,7 +121,6 @@ void blk_rq_init(struct request_queue *q, struct request *rq) > > rq->internal_tag = BLK_MQ_NO_TAG; > > rq->start_time_ns = ktime_get_ns(); > > rq->part = NULL; > > - refcount_set(&rq->ref, 1); > > blk_crypto_rq_set_defaults(rq); > > } > > EXPORT_SYMBOL(blk_rq_init); > > diff --git a/block/blk-flush.c b/block/blk-flush.c > > index 1002f6c58181..4912c8dbb1d8 100644 > > --- a/block/blk-flush.c > > +++ b/block/blk-flush.c > > @@ -329,6 +329,14 @@ static void blk_kick_flush(struct request_queue *q, struct blk_flush_queue *fq, > > flush_rq->rq_flags |= RQF_FLUSH_SEQ; > > flush_rq->rq_disk = first_rq->rq_disk; > > flush_rq->end_io = flush_end_io; > > + /* > > + * Order WRITE ->end_io and WRITE rq->ref, and its pair is the one > > + * implied in refcount_inc_not_zero() called from > > + * blk_mq_find_and_get_req(), which orders WRITE/READ flush_rq->ref > > + * and READ flush_rq->end_io > > Recently we run into similar panic which is a NULL dereference on > rq->mq_hctx in is_flush_rq(), we also > > guess there is race bug just what you have fixed. > > But I have one question here, for a blk-mq device, before issuing the first > flush req, flush_rq->end_io > > is NULL, and for following flush reqs on this blk-mq device, > flush_rq->end_io won't be NULL. I searched > > the codes and don't find any place that sets flush_rq->end_io to be NULL > once flush_rq has been completed. blk_kick_flush(): blk_rq_init memset(rq, 0, sizeof(*rq)); Thanks, Ming