Re: [QUESTION] blk_mq_freeze_queue in elevator_init_mq

yangerkun <yangerkun@xxxxxxxxxx> · Wed, 17 Nov 2021 17:00:22 +0800

On 2021/11/17 16:06, Ming Lei wrote:
On Wed, Nov 17, 2021 at 11:37:13AM +0800, yangerkun wrote:
Nowdays we meet the boot regression while enable lots of mtdblock

What is your boot regression? Any dmesg log?

The result is that when boot with 5.10 kernel compare with 4.4, 5.10
will consume about 1.6s more...

compare with 4.4. The main reason was that the blk_mq_freeze_queue in
elevator_init_mq will wait a RCU gap which want to make sure no IO will
happen while blk_mq_init_sched.

There isn't RCU grace period implied in the blk_mq_freeze_queue() called
from elevator_init_mq(), because the .q_usage_counter works at atomic mode
at that time.

Other module like loop meets this problem too and has been fix with

Again, what is the problem?

commit 2112f5c1330a671fa852051d85cb9eadc05d7eb7
Author: Bart Van Assche <bvanassche@xxxxxxx>
Date:   Thu Aug 5 10:42:00 2021 -0700

    loop: Select I/O scheduler 'none' from inside add_disk()

    ...
    e.g. via a udev rule. This approach has an advantage compared to 
changing
    the I/O scheduler from userspace from 'mq-deadline' into 'none', namely
    that synchronize_rcu() does not get called.

Actually, we has meets this problem too 
before(https://www.spinics.net/lists/linux-block/msg70660.html).

follow patches:

  2112f5c1330a loop: Select I/O scheduler 'none' from inside add_disk()
  90b7198001f2 blk-mq: Introduce the BLK_MQ_F_NO_SCHED_BY_DEFAULT flag

They change the default IO scheduler for loop to 'none'. So no need to
call blk_mq_freeze_queue and blk_mq_init_sched. But it seems not
appropriate for mtdblocks. Mtdblocks can use 'mq-deadline' to help
optimize the random write with the help of mtdblock's cache. Once change
to 'none', we may meet the regression for random write.

commit 737eb78e82d52d35df166d29af32bf61992de71d
Author: Damien Le Moal <damien.lemoal@xxxxxxx>
Date:   Thu Sep 5 18:51:33 2019 +0900

     block: Delay default elevator initialization

     ...

     Additionally, to make sure that the elevator initialization is never
     done while requests are in-flight (there should be none when the device
     driver calls device_add_disk()), freeze and quiesce the device request
     queue before calling blk_mq_init_sched() in elevator_init_mq().
     ...

This commit add blk_mq_freeze_queue in elevator_init_mq which try to
make sure no in-flight request while we go through blk_mq_init_sched.
But does there any drivers can leave IO alive while we go through
elevator_init_mq？ And if no, maybe we can just remove this logical to
fix the regression...

SCSI should have passthrough requests at that moment.

Thanks,
Ming

.