On Sat, 19 May 2018 07:35:55 +0800 Shawn Lin <shawn.lin at rock-chips.com> wrote: > Hi Marc, > > On 2018/5/18 18:05, Marc Zyngier wrote: > > Hi Shawn, > > > > On 18/05/18 10:47, Shawn Lin wrote: > >> gic-v3 seems only suppot distribute hwirq to one CPU in dispite of > >> setting it via /proc/irq/*/smp_affinity. > >> > >> My RK3399 platform has 6 CPUs and I was trying to bind the emmc > >> irq, whose hwirq is 43 and virq is 30, to all cores > >> > >> echo 3f > /proc/irq/30/smp_affinity > >> > >> but the I/O test still shows the irq was fired to CPU0. For really > >> user case, we may try to distribute different hwirqs to different cores, > >> with the hope of distributing to a less irq-binded core as possible. > >> Otherwise, as current implementation, gic-v3 always distribute it > >> to the first masked cpu, which is what cpumask_any_and actually did in > >> practice now on my platform. > > > > That's because GICv3 cannot broadcast the interrupt to all CPUs, and has > > to pick one. > > > > yep, that's what I got from the GIC-V3 TRM. Btw, IIRC, gic-400(belonging > to GIC-V2) on RK3288 platform could support broadcast interrupts to all > CPUs, so I was a bit surprised to know GIC-V3 cannot do it, as v3 sounds > should be more powerful than v2 instinctively. :))) The GICv2 1:N feature is really nasty, actually. It places a overhead on all CPUs (they will all take an interrupt and only one will actually service it, while the others may only see a spurious interrupt). So in practice, you don't really gain anything, unless your CPUs are completely idle. On a busy system, you see an actual performance reduction of your overall throughput, for a very small benefit in latency. That's why I refuse to support this feature in Linux. This may be useful on latency sensitive systems where the software is too primitive to do an effective balancing, but Linux is a bit better than that. Thankfully, GICv3 got rid of this misfeature, and I'm making sure it won't come back. <rant> Overall, interrupt affinity is too critical to be left to the hardware, which has no knowledge of how busy the CPUs are. It is a bit like implementing the scheduler in HW. It works very well if your SW is minimal enough. Grow a bigger system, and HW scheduling is becoming a total pain. That's why you only see such feature on small microcontrollers, and not on larger CPUs. </rant> > > >> > >> So I was thinking to record how much hwirqs are distributed to each > >> core and try to pick up the least used one. > >> > >> This patch is rather rough with slightly test on my board. Just for > >> asking advice from wisdom of your. :) > > > > My advice is not to do this in the kernel. Why don't you run something > > like irqbalance? It will watch the interrupt usage and place move > > interrupts around. > > I will take a look at how irqbalance work in practice. Thanks > for your advice. Let me know how it goes. Thanks, M. -- Without deviation from the norm, progress is not possible.