Re: [PATCH v12 00/25] Linux RISC-V AIA Support

Anup Patel <anup@xxxxxxxxxxxxxx> · Tue, 6 Feb 2024 23:09:44 +0530

On Tue, Feb 6, 2024 at 9:09 PM Björn Töpel <bjorn@xxxxxxxxxx> wrote:
>
> Hi Anup,
>
> Anup Patel <apatel@xxxxxxxxxxxxxxxx> writes:
>
> > The RISC-V AIA specification is ratified as-per the RISC-V international
> > process. The latest ratified AIA specifcation can be found at:
> > https://github.com/riscv/riscv-aia/releases/download/1.0/riscv-interrupts-1.0.pdf
> >
> > At a high-level, the AIA specification adds three things:
> > 1) AIA CSRs
> >    - Improved local interrupt support
> > 2) Incoming Message Signaled Interrupt Controller (IMSIC)
> >    - Per-HART MSI controller
> >    - Support MSI virtualization
> >    - Support IPI along with virtualization
> > 3) Advanced Platform-Level Interrupt Controller (APLIC)
> >    - Wired interrupt controller
> >    - In MSI-mode, converts wired interrupt into MSIs (i.e. MSI generator)
> >    - In Direct-mode, injects external interrupts directly into HARTs
> >
> > For an overview of the AIA specification, refer the AIA virtualization
> > talk at KVM Forum 2022:
> > https://static.sched.com/hosted_files/kvmforum2022/a1/AIA_Virtualization_in_KVM_RISCV_final.pdf
> > https://www.youtube.com/watch?v=r071dL8Z0yo
>
> Thank you for continuing to work on this series! I like this
> direction of the series!
>
> TL;DR: I think we can get rid of most of the id/householding data
> structures, except for the irq matrix.
>
> Most of my comments are more of a design/overview nature, so I'll
> comment here in the cover letter.
>
> I took the series for a spin with and it with Alex' ftrace fix it,
> passes all my tests nicely!
>
> Now some thoughts/comments (I'm coming from the x86 side of things!):
>
> id/enable-tracking: There are a lot of different id/enabled tracking
> with corresponding locks, where there's IMO overlap with what the
> matrix provides.

The matrix allocator does not track the enabled/disabled state of
the per-CPU IDs. This is why we have a separate per-CPU
ids_enabled_bitmap which is also used for remote synchronization
across CPUs.

>
> Let's start with struct imsic_priv:
>
>    | /* Dummy HW interrupt numbers */
>    | unsigned int nr_hwirqs;
>    | raw_spinlock_t hwirqs_lock;
>    | unsigned long *hwirqs_used_bitmap;

The matrix allocator manages actual IDs for each CPU whereas
the Linux irq_data expects a fixed hwirq which does not change.

Due to this, we have a dummy hwirq space which is always
fixed. The only thing that is changed under-the-hood by the
IMSIC driver is the dummy hwirq to actual HW vector (cpu, id)
mapping.

>
> These are used to for the domain routing (hwirq -> desc/virq), and not
> needed. Just use the same id as virq (at allocation time), and get rid
> of these data structures/corresponding functions. The lookup in the
> interrupt handler via imsic_local_priv.vectors doesn't care about
> hwirq. This is what x86 does... The imsic_vector roughly corresponds
> to apic_chip_data (nit: imsic_vector could have the chip_data suffix
> as well, at least it would have helped me!)

Yes, imsic_vector corresponds to apic_chip_data in the x86 world.

>
> Moving/affinity changes. The moving of a vector to another CPU
> currently involves:
>
> 1. Allocate a new vector from the matrix
> 2. Disable/enable the corresponding per-cpu ids_enabled_bitmap (nested
>    spinlocks)
> 3. Trigger two IPIs to apply the bitmap
> 4. On each CPU target (imsic_local_sync()) loop the bitmap and flip
>    all bits, and potentially rearm
>
> This seems a bit heavy-weight: Why are you explicitly setting/clearing
> all the bits in a loop at the local sync?

This can be certainly optimized by introducing another
ids_dirty_bitmap. I will add this in the next revision.

>
> x86 does it a bit differently (more lazily): The chip_data has
> prev_{cpu,vector}/move_in_progress fields, and keep both vectors
> enabled until there's an interrupt on the new vector, and then the old
> one is cleaned (irq_complete_move()).
>
> Further; When it's time to remove the old vector, x86 doesn't trigger
> an IPI on the disabling side, but queues a cleanup job on a per-cpu
> list and triggers a timeout. So, the per-cpu chip_data (per-cpu
> "vectors" in your series) can reside in two places during the transit.

We can't avoid IPIs when moving vectors from one CPU to another
CPU because IMSIC id enable/disable is only possible through
CSRs. Also, keep in-mind that irq affinity change might be initiated
on CPU X for some interrupt targeting CPU Y which is then changed
to target CPU Z.

In the case of x86, they have memory mapped registers which
allows one CPU to enable/disable the ID of another CPU.

>
> I wonder if this clean up is less intrusive, and you just need to
> perform what's in the per-list instead of dealing with the
> ids_enabled_bitmap? Maybe we can even remove that bitmap as well. The
> chip_data/desc has that information. This would mean that
> imsic_local_priv() would only have the local vectors (chip_data), and
> a cleanup list/timer.
>
> My general comment is that instead of having these global id-tracking
> structures, use the matrix together with some desc/chip_data local
> data, which should be sufficient.

The "ids_enabled_bitmap", "dummy hwirqs" and private imsic_vectors
are required since the matrix allocator only manages allocation of
per-CPU IDs.

>
> Random thought: Do we need to explicitly disable (csr) the vector,
> when we're changing the affinity? What if we just leave it enabled,
> and only when mask/unmask is performed it's actually explicitly masked
> (writes to the csr)?

We should not leave it enabled because some rough/buggy device
can inject spurious interrupts using MSI writes to unused enabled
interrupts.

>
> Missing features (which can be added later):
> * Reservation mode/activate support (allocate many MSI, but only
>   request/activate a subset)

I did not see any PCIe or platform device requiring this kind of
reservation. Any examples ?

> * Handle managed interrupts

Any examples of managed interrupts in the RISC-V world ?

> * There might be some irqd flags are missing, which mostly cpuhp care
>   about (e.g. irqd_*_single_target())...

Okay, let me check and update.

>
> Finally; Given that the APLIC requires a lot more patches, depending
> on how the review process moves on -- maybe the IMSIC side could go as
> a separate series?
>

The most popular implementation choice across RISC-V platforms
will be IMSIC + APLIC so both drivers should go together. In fact,
we need both drivers for the QEMU virt machine as well because
UART interrupt (and other wired interrupts) on the QEMU virt
machine goes through APLIC.

Regards,
Anup