We've unfortunately started seeing a situation where percpu interrupts are partitioned in the system: one arbitrary set of CPUs has an interrupt connected to a type of device, while another disjoint set of CPUs has the same interrupt connected to another type of device. This makes it impossible to have a device driver requesting this interrupt using the current percpu-interrupt abstraction, as the same interrupt number is now potentially claimed by at least two drivers, and we forbid interrupt sharing on per-cpu interrupt. A potential solution to this has been proposed by Will Deacon, expanding the handling in the core code: http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/388800.html followed by a counter-proposal from Thomas Gleixner, which Will tried to implement, but ran into issues where the probing code was running in preemptible context, making the percpu-ness of interrupts difficult to guarantee. Another approach to this is to turn things upside down. Let's assume that our system describes all the possible partitions for a given interrupt, and give each of them a unique identifier. It is then possible to create a namespace where the affinity identifier itself is a form of interrupt number. At this point, it becomes easy to implement a set of partitions as a cascaded irqchip, each affinity identifier being the secondary HW irq, as outlined in the following example: Aff-0: { cpu0 cpu3 } Aff-1: { cpu1 cpu2 } Aff-2: { cpu4 cpu5 cpu6 cpu7 } Let's assume that HW interrupt 1 is partitioned over these 3 affinities. When HW interrupt 1 fires on a given CPU, all it takes is to find out which affinity this CPU belongs to, which gives us a new HW interrupt number. Bingo. Of course, this only works as long as you don't have overlapping affinities (but if you do your system is broken anyway). This allows us to keep a number of nice properties: - Each partition results in a separate percpu-interrupt (with a restricted affinity), which keeps drivers happy. This alone garantees that we do not have to change the programming model for per-cpu interrupts. - Because the underlying interrupt is still per-cpu, the overhead of the indirection can be kept pretty minimal. - The core code can ignore most of that crap. For that purpose, we implement a small library that deals with some of the boilerplate code, relying on platform-specific drivers to provide a description of the affinity sets and a set of callbacks. This also relies on a small change in the irqdomain layer, and now offers a way for the affinity of a percpu interrupt to be retrieved by a driver. As an example, the GICv3 driver has been adapted to use this new feature. Patches on top of v4.6-r3, tested on an arm64 FVP model. Marc Zyngier (5): irqdomain: Allow domain matching on irq_fwspec genirq: Allow the affinity of a percpu interrupt to be set/retrieved irqchip: Add per-cpu interrupt partitioning library irqchip/gic-v3: Add support for partitioned PPIs DT: arm,gic-v3: Documment PPI partition support .../bindings/interrupt-controller/arm,gic-v3.txt | 34 ++- drivers/irqchip/Kconfig | 4 + drivers/irqchip/Makefile | 1 + drivers/irqchip/irq-gic-v3.c | 176 +++++++++++++- drivers/irqchip/irq-partition-percpu.c | 256 +++++++++++++++++++++ include/linux/irq.h | 4 + include/linux/irqchip/irq-partition-percpu.h | 59 +++++ include/linux/irqdesc.h | 1 + include/linux/irqdomain.h | 15 +- kernel/irq/irqdesc.c | 26 ++- kernel/irq/irqdomain.c | 19 +- 11 files changed, 580 insertions(+), 15 deletions(-) create mode 100644 drivers/irqchip/irq-partition-percpu.c create mode 100644 include/linux/irqchip/irq-partition-percpu.h -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html