Re: Regarding patch "block/blk-mq: Don't complete locally if capacities are different"

MANISH PANDEY <quic_mapa@xxxxxxxxxxx> · Thu, 8 Aug 2024 11:35:20 +0530

On 8/5/2024 11:22 PM, Bart Van Assche wrote:
On 8/5/24 10:35 AM, MANISH PANDEY wrote:
In our SoC's we manage Power and Perf balancing by dynamically 
changing the IRQs based on the load. Say if we have more load, we 
assign UFS IRQs on Large cluster CPUs and if we have less load, we 
affine the IRQs on Small cluster CPUs.

I don't think that this is compatible with the command completion code
in the block layer core. The blk-mq code is based on the assumption that
the association of a completion interrupt with a CPU core does not
change. See also the blk_mq_map_queues() function and its callers.

IRQ <-> CPU bonded before the start of the operation and it makes sure 
that completion interrupt CPU doesn't change.

Is this mechanism even useful? If completion interrupts are always sent 
to the CPU core that submitted the I/O, no interrupts will be sent to
the large cluster if no code that submits I/O is running on that
cluster. Sending e.g. all completion interrupts to the large cluster can
be achieved by migrating all processes and threads to the large cluster.

>> migrating all completion interrupts to the large cluster can
>> be achieved by migrating all processes and threads to the large
>> cluster.

Agree, this can be achieved, but then for this all the process and 
threads have to be migrated to large cluster and this will have power 
impacts. Hence to balance power and perf, it is not preferred way for 
vendors.

This issue is more affecting UFS MCQ devices, which usages ESI/MSI 
IRQs and have distributed ESI IRQs for CQs.
Mostly we use Large cluster CPUs for binding IRQ and CQ and hence 
completing more completions on Large cluster which won't be from same 
capacity CPU as request may be from S/M clusters.

Please use an approach that is supported by the block layer. I don't
think that dynamically changing the IRQ affinity is compatible with the
block layer.

For UFS with MCQ, ESI IRQs are bounded at the time of initialization.
so basically i would like to use High Performance cluster CPUs to 
migrate few completions from Mid clusters and take the advantage of high 
capacity CPUs. The new change takes away this opportunity from driver.
So basically we should be able to use High Performance CPUs like below

diff --git a/block/blk-mq.c b/block/blk-mq.c
index e3c3c0c21b55..a4a2500c4ef6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1164,7 +1164,7 @@ static inline bool blk_mq_complete_need_ipi(struct 
request *rq)
        if (cpu == rq->mq_ctx->cpu ||
            (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) &&
             cpus_share_cache(cpu, rq->mq_ctx->cpu) &&
-            cpus_equal_capacity(cpu, rq->mq_ctx->cpu)))
+            arch_scale_cpu_capacity(cpu) >= 	 
arch_scale_cpu_capacity(rq->mq_ctx->cpu)))
                return false;

This way driver can use best possible CPUs for it's use case.

Thanks,

Bart.