Hi Qais Yousef,
Recently we observed below patch has been merged
https://lore.kernel.org/all/20240223155749.2958009-3-qyousef@xxxxxxxxxxx
This patch is causing performance degradation ~20% in Random IO along
with significant drop in Sequential IO performance. So we would like to
revert this patch as it impacts MCQ UFS devices heavily. Though Non MCQ
devices are also getting impacted due to this.
We have several concerns with the patch
1. This patch takes away the luxury of affining best possible cpus from
device drivers and limits driver to fall in same group of CPUs.
2. Why can't device driver use irq affinity to use desired CPUs to
complete the IO request, instead of forcing it from block layer.
3. Already CPUs are grouped based on LLC, then if a new categorization
is required ?
big performance impact if the IO request
was done from a CPU with higher capacity but the interrupt is serviced
on a lower capacity CPU.
This patch doesn't considers the issue of contention in submission path
and completion path. Also what if we want to complete the request of
smaller capacity CPU to Higher capacity CPU?
Shouldn't a device driver take care of this and allow the vendors to use
the best possible combination they want to use?
Does it considers MCQ devices and different SQ<->CQ mappings?
Without the patch I see the BLOCK softirq always running on little cores
(where the hardirq is serviced). With it I can see it running on all
cores.
why we can't use echo 2 > rq_affinity to force complete on the same
group of CPUs from where request was initiated?
Also why to force vendors to always use SOFTIRQ for completion?
We should be flexible to either complete the IO request via IPI, HARDIRQ
or SOFTIRQ.
An SoC can have different CPU configuration possible and this patch
forces a restriction on the completion path. This problem is more worse
in MCQ devices as we can have different SQ<->CQ mapping.
So we would like to revert the patch. Please let us know if any concerns?
Regards
Manish Pandey