> From: Song, Jike > Sent: Tuesday, February 23, 2016 11:02 AM > > +Kevin > > On 02/22/2016 06:05 PM, Xiao Guangrong wrote: > > > > On 02/19/2016 08:00 PM, Paolo Bonzini wrote: > >> > >> I still have a doubt: how are you going to handle invalidation of GPU > >> shadow page tables if a device (emulated in QEMU or even vhost) does DMA > >> to the PPGTT? > > > > I think Jike is the better one to answer this question, Jike, could you > > please clarify it? :) > > > > Sure :) > > Actually in guest PPGTT is manipulated by CPU rather than GPU. The > PPGTT page table itself are plain memory, composed & modified by the > GPU driver, i.e. by CPU in Non-Root mode. > > Given that, we write-protected guest PPGTT, when VM writes PPGTT, EPT > violation rather than DMA fault happens. 'DMA to PPGTT' is NOT SUPPORTED on our vGPU device model. Today Intel gfx driver doesn't use this method, and we explicitly list it as a guest driver requirement to support a vGPU. If a malicious driver does program DMA to modify PPGTT, it can only modify guest PPGTT instead of shadow PPGTT (being guest invisible). So there is no security issue either. > > >> Generally, this was the reason to keep stuff out of KVM > >> and instead hook into the kernel mm subsystem (as with userfaultfd). > > > > We considered it carefully but this way can not satisfy KVMGT's requirements. > > The reasons i explained in the old thread (https://lkml.org/lkml/2015/12/1/516) > > are: > > > > "For the performance, shadow GPU is performance critical and requires > > frequently being switched, it is not good to handle it in userspace. And > > windows guest has many GPU tables and updates it frequently, that means, > > we need to write protect huge number of pages which are single page based, > > I am afraid userfaultfd can not handle this case efficiently. Yes, performance is the main concern. Paolo, we explained the reason for in-kernel emulation to you earlier with your understanding: ---- > > It's definitely a fast path, e.g. command submission, shadow GPU page > > table, etc. which are all in performance critical path. Another reason is > > the I/O access frequency, which could be up to 100k/s for some gfx workload. > > It's important to shorten the emulation path which can help performance > > a lot. That's the major reason why we keep vGPU device model in the > > kernel (will merged into i915 driver) > > Ok, thanks---writing numbers down always helps. MMIO to userspace costs > 5000 clock cycles on the latest QEMU and processor (and does not need > the "big QEMU lock" anymore), but still 100k/s is a ~500000 clock cycle > difference and approximately 15% host CPU usage. ---- (I believe ~500000 should be ~500M clock cycle above) > > > > For the functionality, userfaultfd can not fill the need of shadow page > > because: > > - the page is keeping readonly, userfaultfd can not fix the fault and let > > the vcpu progress (write access causes writeable gup). > > > > - the access need to be emulated, however, userfaultfd/kernel does not have > > the ability to emulate the access as the access is trigged by guest, the > > instruction info is stored in VMCS so that only KVM can emulate it. > > > > - shadow page needs to be notified after the emulation is finished as it > > should know the new data written to the page to update its page hierarchy. > > (some hardwares lack the 'retry' ability so the shadow page table need to > > reflect the table in guest at any time). " > > > > Any idea? > > > Thanks Guangrong for investigating the possibility. Based on earlier explanation, we hope KVM community can re-think the necessity of support in-kernel emulation for KVMGT. Same framework might be extended to other type of I/O devices using similar mediated pass-through concept in the future, which has device model tightly integrated with native device driver for efficiency and simplicity purpose. Actually a related open when discussing KVMGT/VFIO integration. There are 7 total services required to support in-kernel emulation, which can be categorize into two groups: a) services to connect vGPU with VM, which are essentially what a device driver is doing (so VFIO can fit here), including: 1) Selectively pass-through a region to a VM 2) Trap-and-emulate a region 3) Inject a virtual interrupt 4) Pin/unpin guest memory 5) GPA->IOVA/HVA translation (as a side-effect) b) services to support device emulation, which gonna be hypervisor specific, including: 6) Map/unmap guest memory 7) Write-protect a guest memory page We're working with VFIO community to add support of category a), but there is still a gap in category b). This patch series can address the requirement of 7). For 6) it's straightforward for KVM. We may introduce a new file in KVM to wrap them together for in-kernel emulation, but need an agreement from community first on this direction. :-) Thanks Kevin ��.n��������+%������w��{.n�����o�^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�