On Fri, 2024-08-30 at 22:26 +0800, Ming Lei wrote: > On Fri, Aug 30, 2024 at 02:55:56PM +0300, Tero Kristo wrote: > > On Thu, 2024-08-29 at 05:37 -0600, Jens Axboe wrote: > > > On 8/29/24 1:18 AM, Tero Kristo wrote: > > > > diff --git a/block/bio.c b/block/bio.c > > > > index e9e809a63c59..6c46d75345d7 100644 > > > > --- a/block/bio.c > > > > +++ b/block/bio.c > > > > @@ -282,6 +282,8 @@ void bio_init(struct bio *bio, struct > > > > block_device *bdev, struct bio_vec *table, > > > > bio->bi_max_vecs = max_vecs; > > > > bio->bi_io_vec = table; > > > > bio->bi_pool = NULL; > > > > + > > > > + bdev_update_cpu_latency_pm_qos(bio->bi_bdev); > > > > } > > > > EXPORT_SYMBOL(bio_init); > > > > > > This is entirely the wrong place to do this, presumably it should > > > be > > > done at IO dispatch time, not when something initializes a bio. > > > > > > And also feels like entirely the wrong way to go about this, > > > adding > > > overhead to potentially each IO dispatch, of which there can be > > > millions > > > per second. > > > > Any thoughts where it could/should be added? > > > > I moved the bdev_* callback from bio_init to the below location and > > it > > seems to work also: > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 3b4df8e5ac9e..d97a3a4252de 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -2706,6 +2706,7 @@ static void __blk_mq_flush_plug_list(struct > > request_queue *q, > > { > > if (blk_queue_quiesced(q)) > > return; > > + bdev_update_cpu_latency_pm_qos(q->disk->part0); > > q->mq_ops->queue_rqs(&plug->mq_list); > > IO submission CPU may not be same with the completion CPU, so this > approach looks wrong. > > What you are trying to do is to avoid IO completion CPU to enter > deep idle in case of inflight block IOs. > > Only fast device cares this CPU latency, maybe you just need to call > some generic helper in driver(NVMe), and you may have to figure out > the exact IO completion CPU for hardware queue with inflight IOs. > > Thanks, > Ming > Thanks for feedback, I've updated my patch to work on the NVMe driver instead, taking the queue CPU affinity into account. I will send a separate RFC of that out soonish to the corresponding list. -Tero