Re: [RFC PATCH 1/2] bdev: add support for CPU latency PM QoS tuning

Tero Kristo <tero.kristo@xxxxxxxxxxxxxxx> · Wed, 04 Sep 2024 14:37:20 +0300

On Fri, 2024-08-30 at 22:26 +0800, Ming Lei wrote:
> On Fri, Aug 30, 2024 at 02:55:56PM +0300, Tero Kristo wrote:
> > On Thu, 2024-08-29 at 05:37 -0600, Jens Axboe wrote:
> > > On 8/29/24 1:18 AM, Tero Kristo wrote:
> > > > diff --git a/block/bio.c b/block/bio.c
> > > > index e9e809a63c59..6c46d75345d7 100644
> > > > --- a/block/bio.c
> > > > +++ b/block/bio.c
> > > > @@ -282,6 +282,8 @@ void bio_init(struct bio *bio, struct
> > > > block_device *bdev, struct bio_vec *table,
> > > >  	bio->bi_max_vecs = max_vecs;
> > > >  	bio->bi_io_vec = table;
> > > >  	bio->bi_pool = NULL;
> > > > +
> > > > +	bdev_update_cpu_latency_pm_qos(bio->bi_bdev);
> > > >  }
> > > >  EXPORT_SYMBOL(bio_init);
> > > 
> > > This is entirely the wrong place to do this, presumably it should
> > > be
> > > done at IO dispatch time, not when something initializes a bio.
> > > 
> > > And also feels like entirely the wrong way to go about this,
> > > adding
> > > overhead to potentially each IO dispatch, of which there can be
> > > millions
> > > per second.
> > 
> > Any thoughts where it could/should be added?
> > 
> > I moved the bdev_* callback from bio_init to the below location and
> > it
> > seems to work also:
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 3b4df8e5ac9e..d97a3a4252de 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2706,6 +2706,7 @@ static void __blk_mq_flush_plug_list(struct
> > request_queue *q,
> >  {
> >         if (blk_queue_quiesced(q))
> >                 return;
> > +       bdev_update_cpu_latency_pm_qos(q->disk->part0);
> >         q->mq_ops->queue_rqs(&plug->mq_list);
> 
> IO submission CPU may not be same with the completion CPU, so this
> approach looks wrong.
> 
> What you are trying to do is to avoid IO completion CPU to enter
> deep idle in case of inflight block IOs.
> 
> Only fast device cares this CPU latency, maybe you just need to call
> some generic helper in driver(NVMe), and you may have to figure out
> the exact IO completion CPU for hardware queue with inflight IOs.
> 
> Thanks,
> Ming
> 

Thanks for feedback, I've updated my patch to work on the NVMe driver
instead, taking the queue CPU affinity into account. I will send a
separate RFC of that out soonish to the corresponding list.

-Tero