On Wed, Dec 4, 2024 at 4:29 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Tue, Dec 03 2024 at 21:55, Thomas Gleixner wrote: > > On Tue, Dec 03 2024 at 22:07, Anup Patel wrote: > >> On Tue, Dec 3, 2024 at 7:23 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > >>> Sorry, I missed that when reviewing the original IMSIC MSI support. > >>> > >>> The whole IMSIC MSI support can be moved over to MSI LIB which makes all > >>> of this indirection go away and your intermediate domain will just fit > >>> in. > >>> > >>> Uncompiled patch below. If that works, it needs to be split up properly. > >>> > >>> Note, this removes the setup of the irq_retrigger callback, but that's > >>> fine because on hierarchical domains irq_chip_retrigger_hierarchy() is > >>> invoked anyway. See try_retrigger(). > >> > >> The IMSIC driver was merged one kernel release before common > >> MSI LIB was merged. > > > > Ah indeed. > > > >> We should definitely update the IMSIC driver to use MSI LIB, I will > >> try your suggested changes (below) and post a separate series. > > > > Pick up the delta patch I gave Andrew... > > As I was looking at something else MSI related I had a look at > imsic_irq_set_affinity() again. > > It's actually required to have the message write in that function and > not afterwards as you invoke imsic_vector_move() from that function. > > That's obviously not true for the remap case as that will not change the > message address/data pair because the remap table entry is immutable - > at least I assume so for my mental sanity sake :) > > But that brings me to a related question. How is this supposed to work > with non-atomic message updates? PCI/MSI does not necessarily provide > masking, and the write of the address/data pair is done in bits and > pieces. So you can end up with an intermediate state seen by the device > which ends up somewhere in interrupt nirvana space. > > See the dance in msi_set_affinity() and commit 6f1a4891a592 > ("x86/apic/msi: Plug non-maskable MSI affinity race") for further > explanation. > > The way how the IMSIC driver works seems to be pretty much the same as > the x86 APIC mess: > > @address is the physical address of the per CPU MSI target > address and @data is the vector ID on that CPU. > > So the non-atomic update in case of non-maskable MSI suffers from the > same problem. It works most of the time, but if it doesn't you might > stare at the occasionally lost interrupt and the stale device in > disbelief for quite a while :) Yes, we have the same challenges as x86 APIC when changing MSI affinity. > > I might be missing something which magically prevent that though :) > Your understanding is correct. In fact, the IMSIC msi_set_affinity() handling is inspired from x86 APIC approach due to similarity in the overall MSI controller. The high-level idea of imsic_irq_set_affinity() is as follows: 1) Allocate new_vector (new CPU IMSIC address + new ID on that CPU) 2) Update the MSI address and data programmed in the device based on new_vector (see imsic_msi_update_msg()) 3) At this point the device points to the new_vector but old_vector (old CPU IMSIC address + old ID on that CPU) is still enabled and we might have received MSI on old_vector while we were busy setting up a new_vector for the device. To address this, we call imsic_vector_move(). 4) The imsic_vector_move() marks the old_vector as being moved and schedules a lazy timer on the old CPU. 5) The lazy timer expires on the old CPU and results in __imsic_local_sync() being called on the old CPU. 6) If there was a pending MSI on the old vector then the __imsic_local_sync() function injects an MSI to the new_vector using an MMIO write. It is very unlikely that an MSI from device will be dropped (unless I am missing something) but the unsolved issue is that handling of in-flight MSI received on the old_vector during the MSI re-programming is delayed which may have side effects on the device driver side. I believe in the future RISC-V AIA v2.0 (whenever that happens) will address the gaps in AIA v1.0 (like this one). Regards, Anup