On Mon, 11 Nov 2024 13:09:20 +0000, Robin Murphy <robin.murphy@xxxxxxx> wrote: > > On 2024-11-09 5:48 am, Nicolin Chen wrote: > > On ARM GIC systems and others, the target address of the MSI is translated > > by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the > > IOMMU is disabled, the MSI address is programmed to the physical location > > of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS > > page is behind the IOMMU, so the MSI address is programmed to an allocated > > IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to > > the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000). > > When a 2-stage translation is enabled, IOVA will be still used to program > > the MSI address, though the mappings will be in two stages: > > IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> 0x20200000 > > (IPA stands for Intermediate Physical Address). > > > > If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the > > IOVA is dynamically allocated from the top of the IOVA space. If attached > > to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is > > fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI, > > which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs. > > > > So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge > > of the IOMMU translation (1-stage translation), since the IOVA for the ITS > > page is fixed and known by kernel. However, with virtual machine enabling > > a nested IOMMU translation (2-stage), a guest kernel directly controls the > > stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an > > IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host > > kernel can't know that guest-level IOVA to program the MSI address. > > > > To solve this problem the VMM should capture the MSI IOVA allocated by the > > guest kernel and relay it to the GIC driver in the host kernel, to program > > the correct MSI IOVA. And this requires a new ioctl via VFIO. > > Once VFIO has that information from userspace, though, do we really > need the whole complicated dance to push it right down into the > irqchip layer just so it can be passed back up again? AFAICS > vfio_msi_set_vector_signal() via VFIO_DEVICE_SET_IRQS already > explicitly rewrites MSI-X vectors, so it seems like it should be > pretty straightforward to override the message address in general at > that level, without the lower layers having to be aware at all, no? +1. I would like to avoid polluting each and every interrupt controller with usage-specific knowledge (they usually are brain-damaged enough). We already have an indirection into the IOMMU subsystem and it shouldn't be a big deal to intercept the message for all implementations at this level. I also wonder how to handle the case of braindead^Wwonderful platforms where ITS transactions are not translated by the SMMU. Somehow, VFIO should be made aware of this situation. Thanks, M. -- Without deviation from the norm, progress is not possible.