RE: [PATCH v2 2/3] iommu/dma: Support MSIs through nested domains

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Fri, 9 Aug 2024 08:00:34 +0000

> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Friday, August 9, 2024 7:00 AM
> 
> On Thu, Aug 08, 2024 at 01:38:44PM +0100, Robin Murphy wrote:
> > On 06/08/2024 9:25 am, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> > > > Sent: Saturday, August 3, 2024 8:32 AM
> > > >
> > > > From: Robin Murphy <robin.murphy@xxxxxxx>
> > > >
> > > > Currently, iommu-dma is the only place outside of IOMMUFD and
> drivers
> > > > which might need to be aware of the stage 2 domain encapsulated
> within
> > > > a nested domain. This would be in the legacy-VFIO-style case where
> we're
> > >
> > > why is it a legacy-VFIO-style? We only support nested in IOMMUFD.
> >
> > Because with proper nesting we ideally shouldn't need the host-managed
> > MSI mess at all, which all stems from the old VFIO paradigm of
> > completely abstracting interrupts from userspace. I'm still hoping
> > IOMMUFD can grow its own interface for efficient MSI passthrough, where
> > the VMM can simply map the physical MSI doorbell into whatever IPA (GPA)
> > it wants it to appear at in the S2 domain, then whatever the guest does
> > with S1 it can program the MSI address into the endpoint accordingly
> > without us having to fiddle with it.
> 
> Hmm, until now I wasn't so convinced myself that it could work as I
> was worried about the data. But having a second thought, since the
> host configures the MSI, it can still set the correct data. What we
> only need is to change the MSI address from a RMRed IPA/gIOVA to a
> real gIOVA of the vITS page.
> 
> I did a quick hack to test that loop. MSI in the guest still works
> fine without having the RMR node in its IORT. Sweet!
> 
> To go further on this path, we will need the following changes:
> - MSI configuration in the host (via a VFIO_IRQ_SET_ACTION_TRIGGER
>   hypercall) should set gIOVA instead of fetching from msi_cookie.
>   That hypercall doesn't forward an address currently, since host
>   kernel pre-sets the msi_cookie. So, we need a way to forward the
>   gIOVA to kernel and pack it into the msi_msg structure. I haven't
>   read the VFIO PCI code thoroughly, yet wonder if we could just
>   let the guest program the gIOVA to the PCI register and fall it
>   through to the hardware, so host kernel handling that hypercall
>   can just read it back from the register?
> - IOMMUFD should provide VMM a way to tell the gPA (or directly +
>   GITS_TRANSLATER?). Then kernel should do the stage-2 mapping. I
>   have talked to Jason about this a while ago, and we have a few
>   thoughts how to implement it. But eventually, I think we still
>   can't avoid a middle man like msi_cookie to associate the gPA in
>   IOMMUFD to PA in irqchip?

Probably a new IOMMU_DMA_MSI_COOKIE_USER type which uses
GPA (passed in in ALLOC_HWPT for a nested_parent type) as IOVA
in iommu_dma_get_msi_page()?

> 
> One more concern is the MSI window size. VMM sets up a MSI region
> that must fit the hardware window size. Most of ITS versions have
> only one page size but one of them can have multiple pages? What
> if vITS is one-page size while the underlying pITS has multiple?
> 
> My understanding of the current kernel-defined 1MB size is also a
> hard-coding window to potential fit all cases, since IOMMU code in
> the code can just eyeball what's going on in the irqchip subsystem
> and adjust accordingly if someday it needs to. But VMM can't?
> 
> Thanks
> Nicolin