> From: Nicolin Chen <nicolinc@xxxxxxxxxx> > Sent: Friday, August 9, 2024 7:00 AM > > On Thu, Aug 08, 2024 at 01:38:44PM +0100, Robin Murphy wrote: > > On 06/08/2024 9:25 am, Tian, Kevin wrote: > > > > From: Nicolin Chen <nicolinc@xxxxxxxxxx> > > > > Sent: Saturday, August 3, 2024 8:32 AM > > > > > > > > From: Robin Murphy <robin.murphy@xxxxxxx> > > > > > > > > Currently, iommu-dma is the only place outside of IOMMUFD and > drivers > > > > which might need to be aware of the stage 2 domain encapsulated > within > > > > a nested domain. This would be in the legacy-VFIO-style case where > we're > > > > > > why is it a legacy-VFIO-style? We only support nested in IOMMUFD. > > > > Because with proper nesting we ideally shouldn't need the host-managed > > MSI mess at all, which all stems from the old VFIO paradigm of > > completely abstracting interrupts from userspace. I'm still hoping > > IOMMUFD can grow its own interface for efficient MSI passthrough, where > > the VMM can simply map the physical MSI doorbell into whatever IPA (GPA) > > it wants it to appear at in the S2 domain, then whatever the guest does > > with S1 it can program the MSI address into the endpoint accordingly > > without us having to fiddle with it. > > Hmm, until now I wasn't so convinced myself that it could work as I > was worried about the data. But having a second thought, since the > host configures the MSI, it can still set the correct data. What we > only need is to change the MSI address from a RMRed IPA/gIOVA to a > real gIOVA of the vITS page. > > I did a quick hack to test that loop. MSI in the guest still works > fine without having the RMR node in its IORT. Sweet! > > To go further on this path, we will need the following changes: > - MSI configuration in the host (via a VFIO_IRQ_SET_ACTION_TRIGGER > hypercall) should set gIOVA instead of fetching from msi_cookie. > That hypercall doesn't forward an address currently, since host > kernel pre-sets the msi_cookie. So, we need a way to forward the > gIOVA to kernel and pack it into the msi_msg structure. I haven't > read the VFIO PCI code thoroughly, yet wonder if we could just > let the guest program the gIOVA to the PCI register and fall it > through to the hardware, so host kernel handling that hypercall > can just read it back from the register? > - IOMMUFD should provide VMM a way to tell the gPA (or directly + > GITS_TRANSLATER?). Then kernel should do the stage-2 mapping. I > have talked to Jason about this a while ago, and we have a few > thoughts how to implement it. But eventually, I think we still > can't avoid a middle man like msi_cookie to associate the gPA in > IOMMUFD to PA in irqchip? Probably a new IOMMU_DMA_MSI_COOKIE_USER type which uses GPA (passed in in ALLOC_HWPT for a nested_parent type) as IOVA in iommu_dma_get_msi_page()? > > One more concern is the MSI window size. VMM sets up a MSI region > that must fit the hardware window size. Most of ITS versions have > only one page size but one of them can have multiple pages? What > if vITS is one-page size while the underlying pITS has multiple? > > My understanding of the current kernel-defined 1MB size is also a > hard-coding window to potential fit all cases, since IOMMU code in > the code can just eyeball what's going on in the irqchip subsystem > and adjust accordingly if someday it needs to. But VMM can't? > > Thanks > Nicolin