On Fri, Aug 30, 2024 at 02:55:56PM +0300, Tero Kristo wrote: > On Thu, 2024-08-29 at 05:37 -0600, Jens Axboe wrote: > > On 8/29/24 1:18 AM, Tero Kristo wrote: > > > diff --git a/block/bio.c b/block/bio.c > > > index e9e809a63c59..6c46d75345d7 100644 > > > --- a/block/bio.c > > > +++ b/block/bio.c > > > @@ -282,6 +282,8 @@ void bio_init(struct bio *bio, struct > > > block_device *bdev, struct bio_vec *table, > > > bio->bi_max_vecs = max_vecs; > > > bio->bi_io_vec = table; > > > bio->bi_pool = NULL; > > > + > > > + bdev_update_cpu_latency_pm_qos(bio->bi_bdev); > > > } > > > EXPORT_SYMBOL(bio_init); > > > > This is entirely the wrong place to do this, presumably it should be > > done at IO dispatch time, not when something initializes a bio. > > > > And also feels like entirely the wrong way to go about this, adding > > overhead to potentially each IO dispatch, of which there can be > > millions > > per second. > > Any thoughts where it could/should be added? > > I moved the bdev_* callback from bio_init to the below location and it > seems to work also: > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 3b4df8e5ac9e..d97a3a4252de 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2706,6 +2706,7 @@ static void __blk_mq_flush_plug_list(struct > request_queue *q, > { > if (blk_queue_quiesced(q)) > return; > + bdev_update_cpu_latency_pm_qos(q->disk->part0); > q->mq_ops->queue_rqs(&plug->mq_list); IO submission CPU may not be same with the completion CPU, so this approach looks wrong. What you are trying to do is to avoid IO completion CPU to enter deep idle in case of inflight block IOs. Only fast device cares this CPU latency, maybe you just need to call some generic helper in driver(NVMe), and you may have to figure out the exact IO completion CPU for hardware queue with inflight IOs. Thanks, Ming