[RFC Design Doc] Enable Shared Virtual Memory feature in pass-through scenarios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm sending this email for the enabling design of supporting SVM in pass-through scenario. Comments are welcome. Pls let me know anything that failed to make you clear. And any suggestions regards to the format is welcomed as well.

Content
===================
1. Feature description
2. Why use it?
3. How to enable it
4. How to test

Details
===================
1. Feature description
This feature is to let application program running within L1 guest share its virtual address with an assigned physical device(e.g. graphics processors or accelerators).
For SVM(shared virtual memory) detail, you may refer to section 2.5.1.1 of Intel VT-d spec and also section 5.6 of OpenCL spec. For details about SVM address translation structure, pls refer to section 3 of Intel VT-d spec. yeah, it's also welcomed to ask directly in this thread.

http://www.intel.com/content/dam/www/public/us/en/documents/product-specifications/vt-directed-io-spec.pdf
https://www.khronos.org/registry/cl/specs/opencl-2.0.pdf


2. Why use it?
It is common to pass-through devices to guest and expect to achieve similar performance as it is on host. With this feature enabled, the SVM in guest machine is also able to let application programs pass data-structures to its assigned devices without unnecessary overheads.


3. How to enable it
The work is actually to virtualize a DMAR hardware which is capable to translate guest virtual address to host physical address when the assigned device makes use of the SVM feature. The key capability to virtualize a remapping hardware is the cache mode. When the CM field is reported as Set, any software updates to any remapping structures (including updates to not-present entries or present entries whose programming resulted in translation faults) requires explicit invalidation of the caches. The enabling work would include the following items.

a) IOMMU Register Access Emulation
The register set for each remapping hardware unit in the platform is placed at a 4KB-aligned memory mapped location. For virtual remapping hardware, guest would allocate such a page. KVM could intercept the access to such page and emulate the accesses to different registers accordingly.

b) QI Handling Emulation
Queued invalidation is for software to send invalidation requests to IOMMU and devices (with device-IOTLB). The invalidation descriptor would be written to a ring buffer which is allocated by OS. Guest OS would allocate a ring buffer for its own DMAR. As designed, software need to set the Invalidation Queue Tail Register after writing a new descriptor to the ring buffer. As item a) mentioned, KVM would intercept the access to the Invalidation Queue Tail Register and then parse the QI descriptor from guest. Eventually, the guest QI descriptors will be put to the ring buffer of the host. So that the physical remapping hardware would process them.

c) Recoverable Fault Handling Emulation
In the case of passed-through device, the page request is sent to host firstly. If the page request is with PASID, then it would be injected to the corresponding guest to have further processing. Guest would process the request and send response through the guest QI interface. Guest QI would be intercepted by KVM as item b) mentioned. Finally, the response would get to the host QI and then to the device. For the requests without PASID, host should be able to handle it. The page requests with PASID would be injected to guest through vMSI.

d) Non-Recoverable Fault Handling Emulation The non-recoverable fault would be injected to guest by vMSI.

e) VT-d Page Table Virtualization
For requests with PASID from assigned device, this design would use the nested mode of VT-d page. For the SVM capable devices which are assigned to a guest, the extended-context-entries that would be used to translate DMA addresses from such devices should have the NESTE bit set. For the requests without PASID from such devices, the address would still be translated by walking the second level page.

Another important thing is shadowing the VT-d page table. Need to report cache mode as Set for the virtual hardware, so the guest software would explicitly issue invalidation operations on the virtual hardware for any/all updates to the guest remapping structures. KVM may trap these guest invalidation operations to keep the shadow translation structures consistent to guest translation structure modifications. In this design, it is any change to the extended context entry would be followed by a invalidation(QI). As item b) described, KVM would intercept it and parse it. For an extended entry modification, KVM would determine if it is necessary to shadow the change to the extended-context-entry which is used by the physical remapping hardware. In nested mode, the physical remapping hardware would treat the PASID table pointer in the extended-context-entry as GPA. So in the shadowing, KVM would just copy the PASID table pointer from guest extended-context-entry to the host extended-context-entry.

f) QEMU support for this feature
Initial plan is to support the devices assigned through VFIO mechanism on q35 machine type. A new option for QEMU would be added. It would be "svm=on|off" and its default value would be off. A new IOCTL command would be added for the fds returned by KVM_CREATE_DEVICE. It would be used to create a IOMMU with SVM capability. The assigned device will be registered to KVM during the guest boot. So that KVM would be able to map the guest BDF to the real BDF. With this map, KVM would be able to distribute guest QI descriptors to different Invalidation Queue of different DMAR unit. The assigned SVM capable devices would be attached to the DMAR0 which is also called virtual remapping hardware in this design. This requires some modification to QEMU.


4. How to test
Test would be done with GPU which has SVM capability. Hereby, Intel i915 GPU would be chosen to do the verification. Intel provides three tools from for SVM verification. They are:
i) 	intel-gpu-tools/tests/gem_svm_sanity
ii)	intel-gpu-tools/tests/gem_svm_fault
iii)	intel-gpu-tools/tests/gem_svm_storedw_loop_render

The following scenarios would have to be covered:
a) Test case 1 - SVM usage in host - with this feature enabled, it shouldn't affect SVM usage in host
i)	Requires a physical machine which has at least one SVM supported device. 
ii)	Run Test Tools on host. Should work well.

b) Test case 2 - SVM usage in guest - with this feature enabled and device assigned, guest should be able to use SVM with its assigned device
i)	Requires a physical machine which has at least one SVM supported device. 
ii)	Create a guest, and assign a SVM supported device to it. 
iii)	Run Test Tools on the guest. Should work well.

c) Test case 3 - SVM usage in multi-guests scenario - multi-guest should be able to use SVM with its assigned devices without affect each other
i)	Requires a physical machine which has at least two SVM supported devices. 
ii)	Create two guests, and assign a SVM supported device to each of them. 
iii)	Run Test Tools on both of the two guests. Both should work well.

d) Test case 4 - SVM usage in host/guest scenario - host and guest shouldn't affect each other
i)	Requires a physical machine which has at least two SVM supported devices. 
ii)	Create a guest, and assign a SVM supported device to the guest. 
iii)	Run Test Tools on both of the host and the guest. Both should work well.


Best Wishes,
Yi Liu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux