+ linux-kernel@xxxxxxxxxxxxxxx On 08/08/2024 08:05, MANISH PANDEY wrote: > > On 8/5/2024 11:22 PM, Bart Van Assche wrote: >> On 8/5/24 10:35 AM, MANISH PANDEY wrote: [...] >> Please use an approach that is supported by the block layer. I don't >> think that dynamically changing the IRQ affinity is compatible with the >> block layer. > > For UFS with MCQ, ESI IRQs are bounded at the time of initialization. > so basically i would like to use High Performance cluster CPUs to > migrate few completions from Mid clusters and take the advantage of high > capacity CPUs. The new change takes away this opportunity from driver. > So basically we should be able to use High Performance CPUs like below > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index e3c3c0c21b55..a4a2500c4ef6 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -1164,7 +1164,7 @@ static inline bool blk_mq_complete_need_ipi(struct > request *rq) > if (cpu == rq->mq_ctx->cpu || > (!test_bit(QUEUE_FLAG_SAME_FORCE, &rq->q->queue_flags) && > cpus_share_cache(cpu, rq->mq_ctx->cpu) && > - cpus_equal_capacity(cpu, rq->mq_ctx->cpu))) > + arch_scale_cpu_capacity(cpu) >= > arch_scale_cpu_capacity(rq->mq_ctx->cpu))) > return false; > > This way driver can use best possible CPUs for it's use case. So the issue for you with commit af550e4c9682 seems to be that those completions don't happen on big CPUs (cpu_capacity = 1024) anymore, since the condition in blk_mq_complete_need_ipi() (1): if (!QUEUE_FLAG_SAME_FORCE && cpus_share_cache(cpu, rq->mq_ctx->cpu) && cpus_equal_capacity(cpu, rq->mq_ctx->cpu)) is no longer true if 'rq->mq_ctx->cpu != big CPU' so (1) returns true and blk_mq_complete_request_remote() sends an ipi to 'rq->mq_ctx->cpu'. I tried to simulate this with a 6 CPUs aarch64 QEMU tri-gear (3 different cpu_capacity values) system: cat /sys/devices/system/cpu/online 0-5 # cat /sys/devices/system/cpu/cpu*/cpu_capacity 446 446 871 871 1024 1024 # grep -i virtio /proc/interrupts | while read a b; do grep -aH . /proc/irq/${a%:}/smp_affinity; done /proc/irq/15/smp_affinity:3f /* block device */ /proc/irq/16/smp_affinity:3f /* network device */ So you set the block device irq affine to the big CPUs (0x30). # echo 30 > /proc/irq/15/smp_affinity And with the patch, you send ipi's in blk_mq_complete_request_remote() in case 'rq->mq_ctx->cpu=[0-4]' whereas w/o the patch or the change to: arch_scale_cpu_capacity(cpu) >= arch_scale_cpu_capacity(rq->mq_ctx->cpu) (2) you would complete the request locally (i.e. on CPU4/5): gic_handle_irq() -> ... -> handle_irq_event() -> ... -> vm_interrupt() -> ... -> virtblk_done() (callback) -> blk_mq_complete_request() -> blk_mq_complete_request_remote(), rq->q->mq_ops->complete(rq) The patch IMHO was introduced to avoid running local when 'local = little CPU'. Since you use system knowledge and set IRQ affinity explicitly to big CPU's to run local on them, maybe (2) is the way to allow both?