> From: Liu, Yi L > Sent: Wednesday, September 14, 2016 7:35 PM > > Hi, > > I'm sending this email for the enabling design of supporting SVM in pass-through scenario. > Comments are welcome. Pls let me know anything that failed to make you clear. And any > suggestions regards to the format is welcomed as well. CC Qemu mailing list... Yi, I think you need a better clarification of who does what. vIOMMU emulation resides in Qemu, which is the place to enable SVM virtualization. Then VFIO needs extension to allow propagate guest context entry from Qemu to shadow context entry in underlying IOMMU driver (configured in nested mode). Ideally KVM doesn't need any change (just reuse existing interface to forward I/O emulation request and to inject virtual interrupt). However your description looks a bit confusing, especially overusing KVM in some places. Also Peter is now enhancing IOMMUNotifier framework. You may take a look to see how SVM virtualization requirement can be fit there. btw this design doc looks too high level. It might be clearer if you directly send out RFC patch set with below description scattered in the right place. > > Content > =================== > 1. Feature description > 2. Why use it? > 3. How to enable it > 4. How to test > > Details > =================== > 1. Feature description > This feature is to let application program running within L1 guest share its virtual address > with an assigned physical device(e.g. graphics processors or accelerators). > For SVM(shared virtual memory) detail, you may refer to section 2.5.1.1 of Intel VT-d spec > and also section 5.6 of OpenCL spec. For details about SVM address translation structure, > pls refer to section 3 of Intel VT-d spec. yeah, it's also welcomed to ask directly in this > thread. > > http://www.intel.com/content/dam/www/public/us/en/documents/product-specification > s/vt-directed-io-spec.pdf > https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf > > > 2. Why use it? > It is common to pass-through devices to guest and expect to achieve similar performance > as it is on host. With this feature enabled, the SVM in guest machine is also able to let > application programs pass data-structures to its assigned devices without unnecessary > overheads. > > > 3. How to enable it > The work is actually to virtualize a DMAR hardware which is capable to translate guest > virtual address to host physical address when the assigned device makes use of the SVM > feature. The key capability to virtualize a remapping hardware is the cache mode. When > the CM field is reported as Set, any software updates to any remapping structures > (including updates to not-present entries or present entries whose programming resulted > in translation faults) requires explicit invalidation of the caches. The enabling work would > include the following items. virtualization of 2nd level translation (GPA->HPA) of VT-d is already there. what you requires is virtualization of the 1st level translation (GVA->GPA), and then has a way to propagate guest context entry (or specifically GPA of PASID table) thru VFIO to intel-iommu driver > > a) IOMMU Register Access Emulation > The register set for each remapping hardware unit in the platform is placed at a > 4KB-aligned memory mapped location. For virtual remapping hardware, guest would > allocate such a page. KVM could intercept the access to such page and emulate the > accesses to different registers accordingly. Not KVM's business. It's emulated by vIOMMU in Qemu > > b) QI Handling Emulation > Queued invalidation is for software to send invalidation requests to IOMMU and devices > (with device-IOTLB). The invalidation descriptor would be written to a ring buffer which is > allocated by OS. Guest OS would allocate a ring buffer for its own DMAR. As designed, > software need to set the Invalidation Queue Tail Register after writing a new descriptor to > the ring buffer. As item a) mentioned, KVM would intercept the access to the Invalidation > Queue Tail Register and then parse the QI descriptor from guest. Eventually, the guest QI > descriptors will be put to the ring buffer of the host. So that the physical remapping > hardware would process them. > > c) Recoverable Fault Handling Emulation > In the case of passed-through device, the page request is sent to host firstly. If the page > request is with PASID, then it would be injected to the corresponding guest to have further > processing. Guest would process the request and send response through the guest QI > interface. Guest QI would be intercepted by KVM as item b) mentioned. Finally, the > response would get to the host QI and then to the device. For the requests without PASID, > host should be able to handle it. The page requests with PASID would be injected to guest > through vMSI. > > d) Non-Recoverable Fault Handling Emulation The non-recoverable fault would be injected > to guest by vMSI. Again, KVM doesn't need to know detail of emulating above faults. They are emulated by Qemu which then triggers a virtual MSI to KVM. Once guest receives the virtual MSI interrupt, the corresponding fault handler will access necessary register or in-memory structures which are emulated or provided by Qemu. > > e) VT-d Page Table Virtualization > For requests with PASID from assigned device, this design would use the nested mode of > VT-d page. For the SVM capable devices which are assigned to a guest, the > extended-context-entries that would be used to translate DMA addresses from such > devices should have the NESTE bit set. For the requests without PASID from such devices, > the address would still be translated by walking the second level page. > > Another important thing is shadowing the VT-d page table. Need to report cache mode as You don't need shadow VT-d page table. Only need shadow context entry. > Set for the virtual hardware, so the guest software would explicitly issue invalidation > operations on the virtual hardware for any/all updates to the guest remapping structures. > KVM may trap these guest invalidation operations to keep the shadow translation > structures consistent to guest translation structure modifications. In this design, it is any > change to the extended context entry would be followed by a invalidation(QI). As item b) > described, KVM would intercept it and parse it. For an extended entry modification, KVM > would determine if it is necessary to shadow the change to the extended-context-entry > which is used by the physical remapping hardware. In nested mode, the physical > remapping hardware would treat the PASID table pointer in the extended-context-entry as > GPA. So in the shadowing, KVM would just copy the PASID table pointer from guest > extended-context-entry to the host extended-context-entry. Please check Peter's work how this can be fit there. > > f) QEMU support for this feature > Initial plan is to support the devices assigned through VFIO mechanism on q35 machine > type. A new option for QEMU would be added. It would be "svm=on|off" and its default > value would be off. A new IOCTL command would be added for the fds returned by > KVM_CREATE_DEVICE. It would be used to create a IOMMU with SVM capability. The > assigned device will be registered to KVM during the guest boot. So that KVM would be able > to map the guest BDF to the real BDF. With this map, KVM would be able to distribute guest > QI descriptors to different Invalidation Queue of different DMAR unit. The assigned SVM > capable devices would be attached to the DMAR0 which is also called virtual remapping > hardware in this design. This requires some modification to QEMU. Please specify exact modifications required in QEMU. most of above description are existing stuff. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html