Re: VFIO mdev with vIOMMU

Alex Williamson <alex.williamson@xxxxxxxxxx> · Thu, 28 Jul 2016 09:41:59 -0600

On Thu, 28 Jul 2016 10:15:24 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:

> Hi, Alex,
> 
> Along with recent enhancement on virtual IOMMU (vIOMMU) in Qemu, I'm 
> thinking whether there is any issue for mdev to cope with vIOMMU. I
> know today VFIO device only works with PowerPC IOMMU (note someone
> is enabling VFIO device with virtual VT-d but looks not complete yet), but
> it's always good to do architecture discussion earlier. :-)
> 
> VFIO mdev framework maintains a GPA->HPA mapping, which are queried
> by vendor specific mdev device model for emulation purpose. For example,
> guest GPU PTEs may need be translated into shadow GPU PTEs, where 
> GPA->HPA conversion is required.
> 
> When a virtual IOMMU is exposed to the guest, IOVA may be used as DMA 
> address by the guest, which means guest PTE now contains IOVA instead 
> of GPA then device model would like to know IOVA->HPA mapping. After 
> checking current vIOMMU logic within Qemu, looks it's not a problem. 
> vIOMMU is expected to notify any IOVA change to VFIO and kernel VFIO 
> driver does receive map requests for IOVA regions. Thus the mapping 
> structure that VFIO maintains does be IOVA->HPA mapping as required 
> by device model. 
> 
> In this manner looks no further change is required on proposed mdev
> framework to support vIOMMU. The only thing that I'm unsure is how
> Qemu guarantees to map IOVA vs. GPA exclusively. I checked that
> vfio_listener_region_add initiates map request for normal memory 
> regions (which is GPA), and then vfio_iommu_map_notify will send
> map request for IOVA region which is notified through IOMMU notifier.
> I don't think VFIO can cope both GPA/IOVA map requests simultaneously,
> since VFIO doesn't maintain multiple address spaces on one device. It's
> not a mdev specific question, but I definitely missed some key points 
> here since it's assumed to be working for PowerPC already...

I prefer not to distinguish GPA vs IOVA, the device always operates in
the IOVA space.  Without a vIOMMU, it just happens to be an identity map
into the GPA space.  Think about how this works on real hardware, when
VT-d is not enabled, there's no translation, IOVA = GPA.  The device
interacts directly with system memory, same as the default case in
QEMU now.  When VT-d is enabled, the device is placed into an IOMMU
domain and the IOVA space is now restricted to the translations defined
within that domain.  The same is expected to happen with QEMU, all of
the GPA mapped IOVA space is removed via vfio_listener_region_del() and
a new IOMMU region is added, enabling the vfio_iommu_map_notify
callbacks.  The fact that we can't have both system memory and an IOMMU
active via vfio_listener_region_add() is a property of the VT-d
emulation.  Anyway, I think it's handled correctly, but until VT-d
emulation actually starts interacting correctly with the iommu map
notifier, we won't know if there might be some lingering bugs.  Thanks,

Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html