RE: [RFC PATCH v3 3/3] VFIO Type1 IOMMU change: to support with iommu and without iommu

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Fri, 13 May 2016 07:45:14 +0000

> From: Neo Jia [mailto:cjia@xxxxxxxxxx]
> Sent: Friday, May 13, 2016 3:42 PM
> 
> On Fri, May 13, 2016 at 03:30:27PM +0800, Jike Song wrote:
> > On 05/13/2016 02:43 PM, Neo Jia wrote:
> > > On Fri, May 13, 2016 at 02:22:37PM +0800, Jike Song wrote:
> > >> On 05/13/2016 10:41 AM, Tian, Kevin wrote:
> > >>>> From: Neo Jia [mailto:cjia@xxxxxxxxxx] Sent: Friday, May 13,
> > >>>> 2016 3:49 AM
> > >>>>
> > >>>>>
> > >>>>>> Perhaps one possibility would be to allow the vgpu driver
> > >>>>>> to register map and unmap callbacks.  The unmap callback
> > >>>>>> might provide the invalidation interface that we're so far
> > >>>>>> missing.  The combination of map and unmap callbacks might
> > >>>>>> simplify the Intel approach of pinning the entire VM memory
> > >>>>>> space, ie. for each map callback do a translation (pin) and
> > >>>>>> dma_map_page, for each unmap do a dma_unmap_page and
> > >>>>>> release the translation.
> > >>>>>
> > >>>>> Yes adding map/unmap ops in pGPU drvier (I assume you are
> > >>>>> refering to gpu_device_ops as implemented in Kirti's patch)
> > >>>>> sounds a good idea, satisfying both: 1) keeping vGPU purely
> > >>>>> virtual; 2) dealing with the Linux DMA API to achive hardware
> > >>>>> IOMMU compatibility.
> > >>>>>
> > >>>>> PS, this has very little to do with pinning wholly or
> > >>>>> partially. Intel KVMGT has once been had the whole guest
> > >>>>> memory pinned, only because we used a spinlock, which can't
> > >>>>> sleep at runtime.  We have removed that spinlock in our
> > >>>>> another upstreaming effort, not here but for i915 driver, so
> > >>>>> probably no biggie.
> > >>>>>
> > >>>>
> > >>>> OK, then you guys don't need to pin everything. The next
> > >>>> question will be if you can send the pinning request from your
> > >>>> mediated driver backend to request memory pinning like we have
> > >>>> demonstrated in the v3 patch, function vfio_pin_pages and
> > >>>> vfio_unpin_pages?
> > >>>>
> > >>>
> > >>> Jike can you confirm this statement? My feeling is that we don't
> > >>> have such logic in our device model to figure out which pages
> > >>> need to be pinned on demand. So currently pin-everything is same
> > >>> requirement in both KVM and Xen side...
> > >>
> > >> [Correct me in case of any neglect:)]
> > >>
> > >> IMO the ultimate reason to pin a page, is for DMA. Accessing RAM
> > >> from a GPU is certainly a DMA operation. The DMA facility of most
> > >> platforms, IGD and NVIDIA GPU included, is not capable of
> > >> faulting-handling-retrying.
> > >>
> > >> As for vGPU solutions like Nvidia and Intel provide, the memory
> > >> address region used by Guest for GPU access, whenever Guest sets
> > >> the mappings, it is intercepted by Host, so it's safe to only pin
> > >> the page before it get used by Guest. This probably doesn't need
> > >> device model to change :)
> > >
> > > Hi Jike
> > >
> > > Just out of curiosity, how does the host intercept this before it
> > > goes on the bus?
> > >
> >
> > Hi Neo,
> >
> > [prologize if I mis-expressed myself, bad English ..]
> >
> > I was talking about intercepting the setting-up of GPU page tables,
> > not the DMA itself.  For currently Intel GPU, the page tables are
> > MMIO registers or simply RAM pages, called GTT (Graphics Translation
> > Table), the writing event to an GTT entry from Guest, is always
> > intercepted by Host.
> 
> Hi Jike,
> 
> Thanks for the details, one more question if the page tables are guest RAM, how do you
> intercept it from host? I can see it get intercepted when it is in MMIO range.
> 

We use page tracking framework, which is newly added to KVM recently,
to mark RAM pages as read-only so write accesses are intercepted to 
device model.

Thanks
Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html