On Fri, 2015-11-20 at 07:09 +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > > Sent: Friday, November 20, 2015 4:03 AM > > > > > > > > > > The proposal is therefore that GPU vendors can expose vGPUs to > > > > userspace, and thus to QEMU, using the VFIO API. For instance, vfio > > > > supports modular bus drivers and IOMMU drivers. An intel-vfio-gvt-d > > > > module (or extension of i915) can register as a vfio bus driver, create > > > > a struct device per vGPU, create an IOMMU group for that device, and > > > > register that device with the vfio-core. Since we don't rely on the > > > > system IOMMU for GVT-d vGPU assignment, another vGPU vendor driver (or > > > > extension of the same module) can register a "type1" compliant IOMMU > > > > driver into vfio-core. From the perspective of QEMU then, all of the > > > > existing vfio-pci code is re-used, QEMU remains largely unaware of any > > > > specifics of the vGPU being assigned, and the only necessary change so > > > > far is how QEMU traverses sysfs to find the device and thus the IOMMU > > > > group leading to the vfio group. > > > > > > GVT-g requires to pin guest memory and query GPA->HPA information, > > > upon which shadow GTTs will be updated accordingly from (GMA->GPA) > > > to (GMA->HPA). So yes, here a dummy or simple "type1" compliant IOMMU > > > can be introduced just for this requirement. > > > > > > However there's one tricky point which I'm not sure whether overall > > > VFIO concept will be violated. GVT-g doesn't require system IOMMU > > > to function, however host system may enable system IOMMU just for > > > hardening purpose. This means two-level translations existing (GMA-> > > > IOVA->HPA), so the dummy IOMMU driver has to request system IOMMU > > > driver to allocate IOVA for VMs and then setup IOVA->HPA mapping > > > in IOMMU page table. In this case, multiple VM's translations are > > > multiplexed in one IOMMU page table. > > > > > > We might need create some group/sub-group or parent/child concepts > > > among those IOMMUs for thorough permission control. > > > > My thought here is that this is all abstracted through the vGPU IOMMU > > and device vfio backends. It's the GPU driver itself, or some vfio > > extension of that driver, mediating access to the device and deciding > > when to configure GPU MMU mappings. That driver has access to the GPA > > to HVA translations thanks to the type1 complaint IOMMU it implements > > and can pin pages as needed to create GPA to HPA mappings. That should > > give it all the pieces it needs to fully setup mappings for the vGPU. > > Whether or not there's a system IOMMU is simply an exercise for that > > driver. It needs to do a DMA mapping operation through the system IOMMU > > the same for a vGPU as if it was doing it for itself, because they are > > in fact one in the same. The GMA to IOVA mapping seems like an internal > > detail. I assume the IOVA is some sort of GPA, and the GMA is managed > > through mediation of the device. > > Sorry I'm not familiar with VFIO internal. My original worry is that system > IOMMU for GPU may be already claimed by another vfio driver (e.g. host kernel > wants to harden gfx driver from rest sub-systems, regardless of whether vGPU > is created or not). In that case vGPU IOMMU driver shouldn't manage system > IOMMU directly. There are different APIs for the IOMMU depending on how it's being use. If the IOMMU is being used for inter-device isolation in the host, then the DMA API (ex. dma_map_page) transparently makes use of the IOMMU. When we're doing device assignment, we make use of the IOMMU API which allows more explicit control (ex. iommu_domain_alloc, iommu_attach_device, iommu_map, etc). A vGPU is not an SR-IOV VF, it doesn't have a unique requester ID that allows the IOMMU to differentiate one vGPU from another, or vGPU from GPU. All mappings for vGPUs need to occur for the GPU. It's therefore the responsibility of the GPU driver, or this vfio extension of that driver, that needs to perform the IOMMU mapping for the vGPU. My expectation is therefore that once the GMA to IOVA mapping is configured in the GPU MMU, the IOVA to HPA needs to be programmed, as if the GPU driver was performing the setup itself, which it is. Before the device mediation that triggered the mapping setup is complete, the GPU MMU and the system IOMMU (if preset) should be configured to enable that DMA. The GPU MMU provides the isolation of the vGPU, the system IOMMU enable the DMA to occur. > btw, curious today how VFIO coordinates with system IOMMU driver regarding > to whether a IOMMU is used to control device assignment, or used for kernel > hardening. Somehow two are conflicting since different address spaces are > concerned (GPA vs. IOVA)... When devices unbind from native host drivers, any previous IOMMU mappings and domains are removed. These are typically created via the DMA API above. The initialization operations of the VFIO API (creating containers, attaching groups to containers, and setting the IOMMU model for a container) work through the IOMMU API to create a new domain and isolate devices within it. The type1 VFIO IOMMU interface is then effectively a passthrough to the iommu_map() and iommu_unmap() interfaces of the IOMMU API, modulo page pinning, accounting and tracking. When a VFIO instance is destroyed, the devices are detached from the IOMMU domain, the devices are unbound from vfio and re-bound to host drivers and the DMA API can reclaim the devices for host isolation. Thanks, Alex _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx