On Fri, Oct 20, 2023 at 2:17 PM Björn Töpel <bjorn@xxxxxxxxxx> wrote: > > Thanks for the quick reply! > > Anup Patel <apatel@xxxxxxxxxxxxxxxx> writes: > > > On Thu, Oct 19, 2023 at 7:13 PM Björn Töpel <bjorn@xxxxxxxxxx> wrote: > >> > >> Hi Anup, > >> > >> Anup Patel <apatel@xxxxxxxxxxxxxxxx> writes: > >> > >> > The RISC-V AIA specification is ratified as-per the RISC-V international > >> > process. The latest ratified AIA specifcation can be found at: > >> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf > >> > > >> > At a high-level, the AIA specification adds three things: > >> > 1) AIA CSRs > >> > - Improved local interrupt support > >> > 2) Incoming Message Signaled Interrupt Controller (IMSIC) > >> > - Per-HART MSI controller > >> > - Support MSI virtualization > >> > - Support IPI along with virtualization > >> > 3) Advanced Platform-Level Interrupt Controller (APLIC) > >> > - Wired interrupt controller > >> > - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator) > >> > - In Direct-mode, injects external interrupts directly into HARTs > >> > >> Thanks for working on the AIA support! I had a look at the series, and > >> have some concerns about interrupt ID abstraction. > >> > >> A bit of background, for readers not familiar with the AIA details. > >> > >> IMSIC allows for 2047 unique MSI ("msi-irq") sources per hart, and > >> each MSI is dedicated to a certain hart. The series takes the approach > >> to say that there are, e.g., 2047 interrupts ("lnx-irq") globally. > >> Each lnx-irq consists of #harts * msi-irq -- a slice -- and in the > >> slice only *one* msi-irq is acutally used. > >> > >> This scheme makes affinity changes more robust, because the interrupt > >> sources on "other" harts are pre-allocated. On the other hand it > >> requires to propagate irq masking to other harts via IPIs (this is > >> mostly done up setup/tear down). It's also wasteful, because msi-irqs > >> are hogged, and cannot be used. > >> > >> Contemporary storage/networking drivers usually uses queues per core > >> (or a sub-set of cores). The current scheme wastes a lot of msi-irqs. > >> If we instead used a scheme where "msi-irq == lnx-irq", instead of > >> "lnq-irq = {hart 0;msi-irq x , ... hart N;msi-irq x}", there would be > >> a lot MSIs for other users. 1-1 vs 1-N. E.g., if a storage device > >> would like to use 5 queues (5 cores) on a 128 core system, the current > >> scheme would consume 5 * 128 MSIs, instead of just 5. > >> > >> On the plus side: > >> * Changing interrupts affinity will never fail, because the interrupts > >> on each hart is pre-allocated. > >> > >> On the negative side: > >> * Wasteful interrupt usage, and a system can potientially "run out" of > >> interrupts. Especially for many core systems. > >> * Interrupt masking need to proagate to harts via IPIs (there's no > >> broadcast csr in IMSIC), and a more complex locking scheme IMSIC > >> > >> Summary: > >> The current series caps the number of global interrupts to maximum > >> 2047 MSIs for all cores (whole system). A better scheme, IMO, would be > >> to expose 2047 * #harts unique MSIs. > >> > >> I think this could simplify/remove(?) the locking as well. > > > > Exposing 2047 * #harts unique MSIs has multiple issues: > > 1) The irq_set_affinity() does not work for MSIs because each > > IRQ is not tied to a particular HART. This means we can't > > balance the IRQ processing load among HARTs. > > Yes, you can balance. In your code, each *active* MSI is still > bound/active to a specific hard together with the affinity mask. In an > 1-1 model you would still need to track the affinity mask, but the > irq_set_affinity() would be different. It would try to allocate a new > MSI from the target CPU, and then switch to having that MSI active. > > That's what x86 does AFAIU, which is also constrained by the # of > available MSIs. > > The downside, as I pointed out, is that the set affinity action can > fail for a certain target CPU. Yes, irq_set_affinity() can fail for the suggested approach plus for RISC-V AIA, one HART does not have access to other HARTs MSI enable/disable bits so the approach will also involve IPI. > > > 2) All wired IRQs for APLIC MSI-mode will also target a > > fixed HART hence irq_set_affinity() won't work for wired > > IRQs as well. > > I'm not following here. Why would APLIC put a constraint here? I had a > look at the specs, and I didn't see anything supporting the current > scheme explicitly. Lets say the number of APLIC wired interrupts are greater than the number of per-CPU IMSIC IDs. In this case, if all wired interrupts are moved to a particular CPU then irq_set_affinity() will fail for some of the wired interrupts. > > > 3) Contemporary storage/networking drivers which use per-core > > queues use irq_set_affinity() on queue IRQs to balance > > across cores but this will fail. > > Or via the the managed interrupts. But this is a non-issue, as pointed > out in my reply to 1. > > > 4) HART hotplug breaks because kernel irq-subsystem can't > > migrate the IRQs (both MSIs and Wired) targeting HART X > > to another HART Y when the HART X goes down. > > Yes, we might end up in scenarios where we can't move to a certain > target cpu, but I wouldn't expect that to be a common scenario. > > > The idea of treating per-HART MSIs as separate IRQs has > > been discussed in the past. > > Aha! I tried to look for it in lore, but didn't find any. Could you > point me to those discussions? This was done 2 years back in the AIA TG meeting when we were doing the PoC for AIA spec. > > > Also, the current approach is very similar to the ARM GICv3 driver > > where ITS LPIs across CPUs are treated as single IRQ. > > I'm not familiar with the GIC. Is the GICv3 design similar to IMSIC? I > had the impression that the GIC had a more advanced interrupt routing > mechanism, than what IMSIC exposes. I think x86 APIC takes the 1-1 > approach (the folks on the To: list definitely knows! ;-)). GIC has a per-CPU redistributor which handles LPIs. The MSIs are taken by GIC ITS and forwarded as LPI to the redistributor of a CPU. The GIC driver treats LPI numbering space as global and not per-CPU. Also, the limit on maximum number of LPIs is quite high because LPI INTID can be 32-bit wide. > > My concern is interrupts become a scarce resource with this > implementation, but maybe my view is incorrect. I've seen bare-metal > x86 systems (no VMs) with ~200 cores, and ~2000 interrupts, but maybe > that is considered "a lot of interrupts". > > As long as we don't get into scenarios where we're running out of > interrupts, due to the software design. > The current approach is simpler and ensures irq_set_affinity always works. The limit of max 2047 IDs is sufficient for many systems (if not all). When we encounter a system requiring a large number of MSIs, we can either: 1) Extend the AIA spec to support greater than 2047 IDs 2) Re-think the approach in the IMSIC driver The choice between #1 and #2 above depends on the guarantees we want for irq_set_affinity(). Regards, Anup