On 11/14/24 16:35, Robin Murphy wrote: > On 13/11/2024 9:11 pm, Alex Williamson wrote: >> On Tue, 12 Nov 2024 21:34:30 -0400 >> Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: >> >>> On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote: >>>> On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote: >>>>> On 2024-11-09 5:48 am, Nicolin Chen wrote: >>>>>> To solve this problem the VMM should capture the MSI IOVA >>>>>> allocated by the >>>>>> guest kernel and relay it to the GIC driver in the host kernel, >>>>>> to program >>>>>> the correct MSI IOVA. And this requires a new ioctl via VFIO. >>>>> >>>>> Once VFIO has that information from userspace, though, do we >>>>> really need >>>>> the whole complicated dance to push it right down into the irqchip >>>>> layer >>>>> just so it can be passed back up again? AFAICS >>>>> vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already >>>>> explicitly >>>>> rewrites MSI-X vectors, so it seems like it should be pretty >>>>> straightforward to override the message address in general at that >>>>> level, without the lower layers having to be aware at all, no? >>>> >>>> Didn't see that clearly!! It works with a simple following override: >>>> -------------------------------------------------------------------- >>>> @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct >>>> vfio_pci_core_device *vdev, >>>> struct msi_msg msg; >>>> >>>> get_cached_msi_msg(irq, &msg); >>>> + if (vdev->msi_iovas) { >>>> + msg.address_lo = >>>> lower_32_bits(vdev->msi_iovas[vector]); >>>> + msg.address_hi = >>>> upper_32_bits(vdev->msi_iovas[vector]); >>>> + } >>>> pci_write_msi_msg(irq, &msg); >>>> } >>>> -------------------------------------------------------------------- >>>> >>>> With that, I think we only need one VFIO change for this part :) >>> >>> Wow, is that really OK from a layering perspective? The comment is >>> pretty clear on the intention that this is to resync the irq layer >>> view of the device with the physical HW. >>> >>> Editing the msi_msg while doing that resync smells bad. >>> >>> Also, this is only doing MSI-X, we should include normal MSI as >>> well. (it probably should have a resync too?) >> >> This was added for a specific IBM HBA that clears the vector table >> during a built-in self test, so it's possible the MSI table being in >> config space never had the same issue, or we just haven't encountered >> it. I don't expect anything else actually requires this. > > Yeah, I wasn't really suggesting to literally hook into this exact > case; it was more just a general observation that if VFIO already has > one justification for tinkering with pci_write_msi_msg() directly > without going through the msi_domain layer, then adding another > (wherever it fits best) can't be *entirely* unreasonable. > > At the end of the day, the semantic here is that VFIO does know more > than the IRQ layer, and does need to program the endpoint differently > from what the irqchip assumes, so I don't see much benefit in dressing > that up more than functionally necessary. > >>> I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for >>> context) >> >> It seems suspect to me too. In a sense it is still just synchronizing >> the MSI address, but to a different address space. >> >> Is it possible to do this with the existing write_msi_msg callback on >> the msi descriptor? For instance we could simply translate the msg >> address and call pci_write_msi_msg() (while avoiding an infinite >> recursion). Or maybe there should be an xlate_msi_msg callback we can >> register. Or I suppose there might be a way to insert an irqchip that >> does the translation on write. Thanks, > > I'm far from keen on the idea, but if there really is an appetite for > more indirection, then I guess the least-worst option would be yet > another type of iommu_dma_cookie to work via the existing > iommu_dma_compose_msi_msg() flow, with some interface for VFIO to > update per-device addresses direcitly. But then it's still going to > need some kind of "layering violation" for VFIO to poke the IRQ layer > into re-composing and re-writing a message whenever userspace feels > like changing an address, because we're fundamentally stepping outside > the established lifecycle of a kernel-managed IRQ around which said > layering was designed... for the record, the first integration was based on such distinct iommu_dma_cookie [PATCH v15 00/12] SMMUv3 Nested Stage Setup (IOMMU part) <https://lore.kernel.org/all/20210411111228.14386-1-eric.auger@xxxxxxxxxx/#r>, patches 8 - 11 Thanks Eric > > Thanks, > Robin. >