Re: Regarding patch "block/blk-mq: Don't complete locally if capacities are different"

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 5 Aug 2024 10:52:50 -0700

On 8/5/24 10:35 AM, MANISH PANDEY wrote:
In our SoC's we manage Power and Perf balancing by dynamically changing 
the IRQs based on the load. Say if we have more load, we assign UFS IRQs 
on Large cluster CPUs and if we have less load, we affine the IRQs on 
Small cluster CPUs.

I don't think that this is compatible with the command completion code
in the block layer core. The blk-mq code is based on the assumption that
the association of a completion interrupt with a CPU core does not
change. See also the blk_mq_map_queues() function and its callers.

Is this mechanism even useful? If completion interrupts are always sent 
to the CPU core that submitted the I/O, no interrupts will be sent to
the large cluster if no code that submits I/O is running on that
cluster. Sending e.g. all completion interrupts to the large cluster can
be achieved by migrating all processes and threads to the large cluster.

This issue is more affecting UFS MCQ devices, which usages ESI/MSI IRQs 
and have distributed ESI IRQs for CQs.
Mostly we use Large cluster CPUs for binding IRQ and CQ and hence 
completing more completions on Large cluster which won't be from same 
capacity CPU as request may be from S/M clusters.

Please use an approach that is supported by the block layer. I don't
think that dynamically changing the IRQ affinity is compatible with the
block layer.

Thanks,

Bart.