Hi Bart, On 4/3/19 6:32 AM, Bart Van Assche wrote: > Since the callback function called by blk_mq_queue_tag_busy_iter() > may sleep calling synchronize_rcu() from __blk_mq_update_nr_hw_queues() > is not sufficient to wait until blk_mq_queue_tag_busy_iter() has > finished. Instead of making __blk_mq_update_nr_hw_queues() wait until > q->q_usage_counter == 0 is globally visible, do not iterate over tags > if the request queue is frozen.> > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxxx> > Cc: James Smart <james.smart@xxxxxxxxxxxx> > Cc: Ming Lei <ming.lei@xxxxxxxxxx> > Cc: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx> > Cc: Keith Busch <keith.busch@xxxxxxxxx> > Cc: Dongli Zhang <dongli.zhang@xxxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > Fixes: 530ca2c9bd69 ("blk-mq: Allow blocking queue tag iter callbacks") # v4.19. > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > --- > block/blk-mq-tag.c | 10 +++++----- > block/blk-mq.c | 5 +---- > 2 files changed, 6 insertions(+), 9 deletions(-) > > diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c > index a4931fc7be8a..89f479634b4d 100644 > --- a/block/blk-mq-tag.c > +++ b/block/blk-mq-tag.c > @@ -383,14 +383,13 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, > > /* > * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx > - * while the queue is frozen. So we can use q_usage_counter to avoid > - * racing with it. __blk_mq_update_nr_hw_queues() uses > - * synchronize_rcu() to ensure this function left the critical section > - * below. > + * while the queue is frozen. Hold q_usage_counter to serialize > + * __blk_mq_update_nr_hw_queues() against this function. > */ > if (!percpu_ref_tryget(&q->q_usage_counter)) > return; > - > + if (atomic_read(&q->mq_freeze_depth)) > + goto out; > queue_for_each_hw_ctx(q, hctx, i) { > struct blk_mq_tags *tags = hctx->tags; > > @@ -405,6 +404,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn, > bt_for_each(hctx, &tags->breserved_tags, fn, priv, true); > bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false); > } > +out: > blk_queue_exit(q); If callback function goes to sleep, we would not be able to reach at blk_queue_exit() to dec q->q_usage_counter. On the other side of __blk_mq_update_nr_hw_queues(), once blk_mq_freeze_queue() is passed, __PERCPU_REF_DEAD is set and q->q_usage_counter should always be 0. Otherwise, blk_mq_freeze_queue_wait() would not move forward. I think we would even not be able to call the callback (which might sleep) if blk_mq_freeze_queue() is already passed? Why we need extra check of q->mq_freeze_depth? Is it an optimization? Please let me know if my understanding is incorrect. Thank you very much! > } > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 652d0c6d5945..f9fc1536408d 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -3226,10 +3226,7 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, > > list_for_each_entry(q, &set->tag_list, tag_set_list) > blk_mq_freeze_queue(q); > - /* > - * Sync with blk_mq_queue_tag_busy_iter. > - */ > - synchronize_rcu(); > + > /* > * Switch IO scheduler to 'none', cleaning up the data associated > * with the previous scheduler. We will switch back once we are done > Dongli Zhang