RE: VFIO mdev with vIOMMU

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Thu, 28 Jul 2016 23:47:58 +0000

> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> Sent: Thursday, July 28, 2016 11:42 PM
> 
> On Thu, 28 Jul 2016 10:15:24 +0000
> "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
> 
> > Hi, Alex,
> >
> > Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm
> > thinking whether there is any issue for mdev to cope with vIOMMU. I
> > know today VFIO device only works with PowerPC IOMMU (note someone
> > is enabling VFIO device with virtual VT-d but looks not complete yet), but
> > it's always good to do architecture discussion earlier. :-)
> >
> > VFIO mdev framework maintains a GPA->HPA mapping, which are queried
> > by vendor specific mdev device model for emulation purpose. For example,
> > guest GPU PTEs may need be translated into shadow GPU PTEs, where
> > GPA->HPA conversion is required.
> >
> > When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA
> > address by the guest, which means guest PTE now contains IOVA instead
> > of GPA then device model would like to know IOVA->HPA mapping. After
> > checking current vIOMMU logic within Qemu, looks it's not a problem.
> > vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO
> > driver does receive map requests for IOVA regions. Thus the mapping
> > structure that VFIO maintains does be IOVA->HPA mapping as required
> > by device model.
> >
> > In this manner looks no further change is required on proposed mdev
> > framework to support vIOMMU. The only thing that I'm unsure is how
> > Qemu guarantees to map IOVA vs. GPA exclusively. I checked that
> > vfio_listener_region_add initiates map request for normal memory
> > regions (which is GPA), and then vfio_iommu_map_notify will send
> > map request for IOVA region which is notified through IOMMU notifier.
> > I don't think VFIO can cope both GPA/IOVA map requests simultaneously,
> > since VFIO doesn't maintain multiple address spaces on one device. It's
> > not a mdev specific question, but I definitely missed some key points
> > here since it's assumed to be working for PowerPC already...
> 
> I prefer not to distinguish GPA vs IOVA, the device always operates in
> the IOVA space.  Without a vIOMMU, it just happens to be an identity map
> into the GPA space.  Think about how this works on real hardware, when
> VT-d is not enabled, there's no translation, IOVA = GPA.  The device
> interacts directly with system memory, same as the default case in
> QEMU now.  When VT-d is enabled, the device is placed into an IOMMU
> domain and the IOVA space is now restricted to the translations defined
> within that domain.  The same is expected to happen with QEMU, all of
> the GPA mapped IOVA space is removed via vfio_listener_region_del() and
> a new IOMMU region is added, enabling the vfio_iommu_map_notify

Ha, it is the info I'm looking for. Can you help point to me where above 
logic is implemented? I only saw the latter part about adding a new IOMMU
region...

And suppose we also have logic to do the vice versa - when guest disables
IOMMU then all IOVA mappings will be deleted and then GPA mapped IOVA
space will be replayed?

> callbacks.  The fact that we can't have both system memory and an IOMMU
> active via vfio_listener_region_add() is a property of the VT-d
> emulation.  Anyway, I think it's handled correctly, but until VT-d
> emulation actually starts interacting correctly with the iommu map
> notifier, we won't know if there might be some lingering bugs.  Thanks,
> 

Thanks
Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html