On Tue, Jun 14, 2022 at 09:15:36PM +0800, Yu Kuai wrote: > 在 2022/06/14 15:31, Ming Lei 写道: > > On Tue, Jun 14, 2022 at 03:14:10PM +0800, Yu Kuai wrote: > > > We found that boot time is increased for about 8s after upgrading kernel > > > from v4.19 to v5.10(megaraid-sas is used in the environment). > > > > But 'blk-mq: clearing flush request reference in tags->rqs[]' was merged > > to v5.14, :-) > Hi, > > Yes, but this patch is applied to 5.10 stable, thus we backport in our > v5.10. Sorry that I didn't mention that. > > > > > > > > > Following is where the extra time is spent: > > > > > > > > > __scsi_remove_device > > > blk_cleanup_queue > > > blk_mq_exit_queue > > > blk_mq_exit_hw_queues > > > blk_mq_exit_hctx > > > blk_mq_clear_flush_rq_mapping -> function latency is 0.1ms > > > > So queue_depth looks pretty long, is it 4k? > No, in the environment, it's just 32, and nr_hw_queues is 128, which > means each blk_cleanup_queue() will cost about 10-20 ms. So 32 * cmpxchg takes 100us, which is really crazy, what is the arch and processor? > > > > > But if it is 0.1ms, how can the 8sec delay be caused? That requires 80K hw queues > > for making so long, so I guess there must be other delay added by the feature > > of BLK_MQ_F_TAG_HCTX_SHARED. > > Please see details in the reasons 2), scsi scan will call > __scsi_remove_device() a lot of times(each host, each channel, each > target). 80K / 128 = 640 LUNs. OK, that can be true. > > > > > cmpxchg > > > > > > There are three reasons: > > > 1) megaraid-sas is using multiple hctxs in v5.10, thus blk_mq_exit_hctx() > > > will be called much more times in v5.10 compared to v4.19. > > > 2) scsi will scan for each target thus __scsi_remove_device() will be > > > called for many times. > > > 3) blk_mq_clear_flush_rq_mapping() is introduced after v4.19, it will > > > call cmpxchg() for each request, and function latency is abount 0.1ms. > > > > > > Since that blk_mq_clear_flush_rq_mapping() will only be called while the > > > queue is freezed already, which means there is no inflight request, > > > it's safe to set NULL for 'tags->rqs[]' directly instead of using > > > cmpxchg(). Tests show that with this change, function latency of > > > blk_mq_clear_flush_rq_mapping() is about 1us, and boot time is not > > > increased. > > > > tags is shared among all LUNs attached to the host, so freezing single > > request queue here means nothing, so your patch doesn't work. > > You'are right, I forgot about that tags can be shared. > > > > > Please test the following patch, and see if it can improve boot delay for > > your case. > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index e9bf950983c7..1463076a527c 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -3443,8 +3443,9 @@ static void blk_mq_exit_hctx(struct request_queue *q, > > if (blk_mq_hw_queue_mapped(hctx)) > > blk_mq_tag_idle(hctx); > > - blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx], > > - set->queue_depth, flush_rq); > > + if (blk_queue_init_done(q)) > > + blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx], > > + set->queue_depth, flush_rq); > > if (set->ops->exit_request) > > set->ops->exit_request(set, flush_rq, hctx_idx); > > Thanks for the patch, I test it and boot delay is fixed. Thanks for the test, and I will send it out tomorrow. Thanks, Ming