Re: Regarding patch "block/blk-mq: Don't complete locally if capacities are different"

Qais Yousef <qyousef@xxxxxxxxxxx> · Fri, 9 Aug 2024 01:36:16 +0100

On 08/08/24 11:35, MANISH PANDEY wrote:
> 
> 
> On 8/5/2024 11:22 PM, Bart Van Assche wrote:
> > On 8/5/24 10:35 AM, MANISH PANDEY wrote:
> > > In our SoC's we manage Power and Perf balancing by dynamically
> > > changing the IRQs based on the load. Say if we have more load, we
> > > assign UFS IRQs on Large cluster CPUs and if we have less load, we
> > > affine the IRQs on Small cluster CPUs.
> > 
> > I don't think that this is compatible with the command completion code
> > in the block layer core. The blk-mq code is based on the assumption that
> > the association of a completion interrupt with a CPU core does not
> > change. See also the blk_mq_map_queues() function and its callers.
> > 
> IRQ <-> CPU bonded before the start of the operation and it makes sure that
> completion interrupt CPU doesn't change.
> 
> > Is this mechanism even useful? If completion interrupts are always sent
> > to the CPU core that submitted the I/O, no interrupts will be sent to
> > the large cluster if no code that submits I/O is running on that
> > cluster. Sending e.g. all completion interrupts to the large cluster can
> > be achieved by migrating all processes and threads to the large cluster.
> > 
> >> migrating all completion interrupts to the large cluster can
> >> be achieved by migrating all processes and threads to the large
> >> cluster.
> 
> Agree, this can be achieved, but then for this all the process and threads
> have to be migrated to large cluster and this will have power impacts. Hence
> to balance power and perf, it is not preferred way for vendors.

I don't get why irq_affinity=1 is compatible with this case? Isn't this custom
setup is a fully managed system by you and means you want rq_affinity=0? What
do you lose if you move to rq_affinity=0?

> 
> > > This issue is more affecting UFS MCQ devices, which usages ESI/MSI
> > > IRQs and have distributed ESI IRQs for CQs.
> > > Mostly we use Large cluster CPUs for binding IRQ and CQ and hence
> > > completing more completions on Large cluster which won't be from
> > > same capacity CPU as request may be from S/M clusters.
> > 
> > Please use an approach that is supported by the block layer. I don't
> > think that dynamically changing the IRQ affinity is compatible with the
> > block layer.
> 
> For UFS with MCQ, ESI IRQs are bounded at the time of initialization.
> so basically i would like to use High Performance cluster CPUs to migrate
> few completions from Mid clusters and take the advantage of high capacity
> CPUs. The new change takes away this opportunity from driver.

It doesn't. You want to fully customize where your completion runs without any
interference from block layer from what I read. Disable rq_affinity and do what
you want? Your description says you don't want the block layer to interfere
with your affinity setup.

> So basically we should be able to use High Performance CPUs like below
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index e3c3c0c21b55..a4a2500c4ef6 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1164,7 +1164,7 @@ static inline bool blk_mq_complete_need_ipi(struct
> request *rq)
>         if (cpu == rq->mq_ctx->cpu ||
>             (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
>              cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
> -            cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))
> +            arch_scale_cpu_capacity(cpu) >= 	
> arch_scale_cpu_capacity(rq->mq_ctx->cpu)))
>                 return false;
> 
> This way driver can use best possible CPUs for it's use case.
> > 
> > Thanks,
> > 
> > Bart.
> >