On Wed, 2016-02-03 at 08:04 +0000, Tian, Kevin wrote: > > From: Zhiyuan Lv > > Sent: Tuesday, February 02, 2016 3:35 PM > > > > Hi Gerd/Alex, > > > > On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote: > > > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote: > > > > Hi, > > > > > > > > > > Unfortunately it's not the only one. Another example is, device-model > > > > > > may want to write-protect a gfn (RAM). In case that this request goes > > > > > > to VFIO .. how it is supposed to reach KVM MMU? > > > > > > > > > > Well, let's work through the problem. How is the GFN related to the > > > > > device? Is this some sort of page table for device mappings with a base > > > > > register in the vgpu hardware? > > > > > > > > IIRC this is needed to make sure the guest can't bypass execbuffer > > > > verification and works like this: > > > > > > > > (1) guest submits execbuffer. > > > > (2) host makes execbuffer readonly for the guest > > > > (3) verify the buffer (make sure it only accesses resources owned by > > > > the vm). > > > > (4) pass on execbuffer to the hardware. > > > > (5) when the gpu is done with it make the execbuffer writable again. > > > > > > Ok, so are there opportunities to do those page protections outside of > > > KVM? We should be able to get the vma for the buffer, can we do > > > something with that to make it read-only. Alternatively can the vgpu > > > driver copy it to a private buffer and hardware can execute from that? > > > I'm not a virtual memory expert, but it doesn't seem like an > > > insurmountable problem. Thanks, > > > > Originally iGVT-g used write-protection for privilege execbuffers, as Gerd > > described. Now the latest implementation has removed wp to do buffer copy > > instead, since the privilege command buffers are usually small. So that part > > is fine. > > > > But we need write-protection for graphics page table shadowing as well. Once > > guest driver modifies gpu page table, we need to know that and manipulate > > shadow page table accordingly. buffer copy cannot help here. Thanks! > > > > After walking through the whole thread again, let me do a summary here > so everyone can be on the same page. > > First, Jike told me before his vacation, that we cannot do any change to > KVM module according to community comments. Now I think it's not true. > We can do necessary changes, as long as it is done in a structural/layered > approach, w/o hard assumption on KVMGT as the only user. That's the > guideline we need to obey. :-) We certainly need to separate the functionality that you're trying to enable from the more pure concept of vfio. vfio is a userspace driver interfaces, not a userspace driver interface for KVM-based virtual machines. Maybe it's more of a gimmick that we can assign PCI devices to QEMU tcg VMs, but that's really just the proof of concept for more useful capabilities, like supporting DPDK applications. So, I begrudgingly agree that structured/layered interactions are acceptable, but consider what use cases may be excluded by doing so. > Mostly we care about two aspects regarding to a vgpu driver: > - services/callbacks which vgpu driver provides to external framework > (e.g. vgpu core driver and VFIO); > - services/callbacks which vgpu driver relies on for proper emulation > (e.g. from VFIO and/or hypervisor); > > The former is being discussed in another thread. So here let's focus > on the latter. > > In general Intel GVT-g requires below services for emulation: > > 1) Selectively pass-through a region to a VM > -- > This can be supported by today's VFIO framework, by setting > VFIO_REGION_INFO_FLAG_MMAP for concerned regions. Then Qemu > will mmap that region which will finally be added to the EPT table of > the target VM > > 2) Trap-and-emulate a region > -- > Similarly, this can be easily achieved by clearing MMAP flag for concerned > regions. Then every access from VM will go through Qemu and then VFIO > and finally reach vgpu driver. The only concern is in the performance > part. We need some general mechanism to allow delivering I/O emulation > request directly from KVM in kernel. For example, Alex mentioned some > flavor based on file descriptor + offset. Likely let's move forward with > the default Qemu forwarding, while brainstorming exit-less delivery in parallel. > > 3) Inject a virtual interrupt > -- > We can leverage existing VFIO IRQ injection interface, including configuration > and irqfd interface. > > 4) Map/unmap guest memory > -- > It's there for KVM. Map and unmap for who? For the vGPU or for the VM? It seems like we know how to map guest memory for the vGPU without KVM, but that's covered in 7), so I'm not entirely sure what this is specifying. > 5) Pin/unpin guest memory > -- > IGD or any PCI passthru should have same requirement. So we should be > able to leverage existing code in VFIO. The only tricky thing (Jike may > elaborate after he is back), is that KVMGT requires to pin EPT entry too, > which requires some further change in KVM side. But I'm not sure whether > it still holds true after some design changes made in this thread. So I'll > leave to Jike to further comment. PCI assignment requires pinning all of guest memory, I would think that IGD would only need to pin selective memory, so is this simply stating that both have the need to pin memory, not that they'll do it to the same extent? > 6) Write-protect a guest memory page > -- > The primary purpose is for GPU page table shadowing. We need to track > modifications on guest GPU page table, so shadow part can be synchronized > accordingly. Just think about CPU page table shadowing. And old example > as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes > not necessary now. > > So we need KVM to provide an interface so some agents can request such > write-protection action (not just for KVMGT. could be for other tracking > usages). Guangrong has been working on a general page tracking mechanism, > upon which write-protection can be easily built on. The review is still in > progress. I have a hard time believing we don't have the mechanics to do this outside of KVM. We should be able to write protect user pages from the kernel, this is how copy-on-write generally works. So it seems like we should be able to apply those same mechanics to our userspace process, which just happens to be a KVM VM. I'm hoping that Paolo might have some ideas how to make this work or maybe Intel has some virtual memory experts that can point us in the right direction. > 7) GPA->IOVA/HVA translation > -- > It's required in various places, e.g.: > - read a guest structure according to GPA > - replace GPA with IOVA in various shadow structures > > We can maintain both translations in vfio-iommu-type1 driver, since > necessary information is ready at map interface. And we should use > MemoryListener to update the database. It's already there for physical > device passthru (Qemu uses MemoryListener and then rely to vfio). > > vfio-vgpu will expose query interface, thru vgpu core driver, so that > vgpu driver can use above database for whatever purpose. > > > ---- > Well, then I realize pretty much opens have been covered with a solution > when ending this write-up. Then we should move forward to come up a > prototype upon which we can then identify anything missing or overlooked > (definitely there would be), and also discuss several remaining opens atop > (such as exit-less emulation, pin/unpin, etc.). Another thing we need > to think is whether this new design is still compatible to Xen side. > > Thanks a lot all for the great discussion (especially Alex with many good > inputs)! I believe it becomes much clearer now than 2 weeks ago, about > how to integrate KVMGT with VFIO. :-) Thanks for your summary, Kevin. It does seem like there are only a few outstanding issues which should be manageable and hopefully the overall approach is cleaner for QEMU, management tools, and provides a more consistent user interface as well. If we can translate the solution to Xen, that's even better. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html