On Tue, 12 Nov 2024 21:34:30 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Tue, Nov 12, 2024 at 01:54:58PM -0800, Nicolin Chen wrote: > > On Mon, Nov 11, 2024 at 01:09:20PM +0000, Robin Murphy wrote: > > > On 2024-11-09 5:48 am, Nicolin Chen wrote: > > > > To solve this problem the VMM should capture the MSI IOVA allocated by the > > > > guest kernel and relay it to the GIC driver in the host kernel, to program > > > > the correct MSI IOVA. And this requires a new ioctl via VFIO. > > > > > > Once VFIO has that information from userspace, though, do we really need > > > the whole complicated dance to push it right down into the irqchip layer > > > just so it can be passed back up again? AFAICS > > > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already explicitly > > > rewrites MSI-X vectors, so it seems like it should be pretty > > > straightforward to override the message address in general at that > > > level, without the lower layers having to be aware at all, no? > > > > Didn't see that clearly!! It works with a simple following override: > > -------------------------------------------------------------------- > > @@ -497,6 +497,10 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, > > struct msi_msg msg; > > > > get_cached_msi_msg(irq, &msg); > > + if (vdev->msi_iovas) { > > + msg.address_lo = lower_32_bits(vdev->msi_iovas[vector]); > > + msg.address_hi = upper_32_bits(vdev->msi_iovas[vector]); > > + } > > pci_write_msi_msg(irq, &msg); > > } > > > > -------------------------------------------------------------------- > > > > With that, I think we only need one VFIO change for this part :) > > Wow, is that really OK from a layering perspective? The comment is > pretty clear on the intention that this is to resync the irq layer > view of the device with the physical HW. > > Editing the msi_msg while doing that resync smells bad. > > Also, this is only doing MSI-X, we should include normal MSI as > well. (it probably should have a resync too?) This was added for a specific IBM HBA that clears the vector table during a built-in self test, so it's possible the MSI table being in config space never had the same issue, or we just haven't encountered it. I don't expect anything else actually requires this. > I'd want Thomas/Marc/Alex to agree.. (please read the cover letter for > context) It seems suspect to me too. In a sense it is still just synchronizing the MSI address, but to a different address space. Is it possible to do this with the existing write_msi_msg callback on the msi descriptor? For instance we could simply translate the msg address and call pci_write_msi_msg() (while avoiding an infinite recursion). Or maybe there should be an xlate_msi_msg callback we can register. Or I suppose there might be a way to insert an irqchip that does the translation on write. Thanks, Alex