[RFC PATCH 00/20] Qemu: Extend intel_iommu emulator to support Shared Virtual Memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

This patchset is proposing a solution to extend the current
Intel IOMMU emulator in QEMU to support Shared Virtual Memory
usage in guest. The whole SVM virtualization for intel_iommu
has two series which introduces changes in Qemu and VFIO/IOMMU.
This patchset mainly changes Qemu. For VFIO/IOMMU changes, it is
in another patchset.

"[RFC PATCH 0/8] Shared Virtual Memory virtualization for VT-d"

[Terms]:
SVM: Shared Virtual Memory
vSVM: virtual SVM, mean use SVM in guest
IOVA: I/O Virtual Address
gIOVA: I/O Virtual Address in guest
GVA: virtual memory address in guest
GPA: physical address in guest
HPA: physical address in host
PRQ: Page Request
vIOMMU: Virtual IOMMU emulated by QEMU
pIOMMU: physical IOMMU on HW
QI: Queued Invalidation, a mechanism used to invalidate cache in VT-d
PASID: Process Address Space ID
IGD: Intel Graphics Device
PT: Passthru Mode
ECS: Extended Context Support
Ex-Root Table: root table used in ECS mode
Ex-Context Table: context table use in ECS mode

[About Shared Virtual Memory]
Shared Virtual Memory (SVM) is a VT-d feature that allows sharing
application address space with the I/O device. The feature works
with the PCI sig Process Address Space ID (PASID). SVM has the
following benefits:

* Programmer gets a consistent view of memory across host application
  and device.
* Efficient access to data, avoiding pining or copying overheads.
* Memory over-commit via demand paging for both CPU and device access
  to memory.

IGD is a SVM capable device, applications like OpenCL wants SVM support
to achieve the benefits above. This patchset was tested with IGD and SVM
tools provided by IGD driver developer.


[vSVM]
SVM usage in guest would be mentioned as vSVM in this patch set. vSVM
enables sharing guest application address space with assigned devices.

The following diagram illustrates the relationship of the Ex-Root Table
, Ex-Context Table, PASID Table, First-Level Page Table, Second-Level
Page Table on VT-d.

                                              ------+
                                            ------+ |
                                         +------+ | |
                              PASID      |      | | |
                              Table      +------+ | |
                             +------+    |      | | |
                Ex-Context   |      |    +------+ | |
                   Table     +------+    |      | |
                 +------+    | pasid| -->+------+
      Ex-Root    |      |    +------+    First-Level
      Table      +------+    |      |    Page Table
     +------+    |devfn | -->+------+
     |      |    +------+ \
     +------+    |      |  \                ------+
     | bus  | -->+------+   \             ------+ |
     +------+                \         +------+ | |
     |      |                 \        |      | | |
     +------+                  \       +------+ | |
    /                           \      |      | | |
RTA                              \     +------+ | |
                                  \    |      | |
                                   --> +------+
                                       Second-Level
                                       Page Table

To achieve the virtual SVM usage, GVA->HPA mapping in physical VT-d
is needed. On VT-d, there is nested mode which is able to achieve
GVA->HPA mapping. With nested mode enabled for a device, any request-
with-PASID from this device would be translated with first-level page
table and second-level page table in a nested mode. The translation
process is getting GVA->GPA by first-level page table, and then getting
GPA->HPA by second-level page table.
                                       
The translation above could be achieve by linking the whole guest PASID
table to host. With guest PASID table linked, the Remapping Hardware in
VT-d could use the guest first-level page table to get GVA->GPA translation
and then use the host second-level page table to get GPA->HPA translation.

Besides nested mode and linking guest PASID table to host, caching-mode
is another key capability. Reporting the Caching Mode as Set for the
virtual hardware requires the guest software to explicitly issue
invalidation operations on the virtual hardware for any/all updates to the
guest remapping structures. The virtualizing software may trap these guest
invalidation operations to keep the shadow translation structures consistent
to guest translation structure modifications. With Caching Mode reported to
guest, intel_iommu emulator could trap the programming of context entry in
guest thus link the guest PASID table to host and set nested mode.

[vSVM Implementation]
To enable SVM usage to guest, the work includes the following items.

Initialization Phase:
(1) Report SVM required capabilities in intel_iommu emulator
(2) Trap the guest context cache invalidation, link the whole guest PASID
    table to host ex-context entry
(3) Set nested mode in host extended-context entry

Run-time:
(4) Forward guest cache invalidation requests for 1st level translation to
    pIOMMU
(5) Fault reporting, reports fault happen on host to intel_iommu emulator,
    then to guest
(6) Page Request and response

As fault reporting framework is in discussion in another thread which is
driven by Lan Tianyu, so vSVM enabling plan is to divide the work into two
phase. This patchset is for Phase 1.

Phase 1: include item (1), (2) and (3).
Phase 2: include item (4), (5) and (6).


[Overview of patch]
This patchset has a requirement of Passthru-Mode supporting for
intel_iommu. Peter Xu has sent a patch for it.
https://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg443627.html

* 1 ~ 2 enables Extend-Context Support in intel_iommu emulator.
* 3 exposes SVM related capability to guest with an option.
* 4 changes VFIO notifier parameter for the newly added notifier.
* 5 ~ 6 adds new VFIO notifier for pasid table bind request.
* 7 ~ 8 adds notifier flag check in memory_replay and region_del.
* 9 ~ 11 introduces a mechanism between VFIO and intel_iommu emulator
  to record assigned device info. e.g. the host SID of the assigned
  device.
* 12 adds fire function for pasid table bind notifier
* 13 adds generic definition for pasid table info in iommu.h
* 14 ~ 15 link the guest pasid table to host for intel_iommu
* 16 adds VFIO notifier for propagating guest IOMMU TLB invalidate
  to host.
* 17 adds fire function for IOMMU TLB invalidate notifier
* 18 ~ 20 propagate first-level page table related cache invalidate
  to host.

[Test Done]
The patchset is tested with IGD. Assign IGD to guest, the IGD could
write data to guest application address space.

i915 SVM capable driver could be found:
https://cgit.freedesktop.org/~miku/drm-intel/?h=svm

i915 svm test tool:
https://cgit.freedesktop.org/~miku/intel-gpu-tools/log/?h=svm


[Co-work with gIOVA enablement]
Currently Peter Xu is working on enabling gIOVA usage for Intel
IOMMU emulator, this patchset is based on Peter's work (V7).
https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v7

[Limitation]
* Due to VT-d HW limitation, an assigned device cannot use gIOVA
and vSVM in the same time. Intel VT-d spec would introduce a new
capability bit indicating such limitation which guest IOMMU driver
can check to prevent both IOVA/SVM enabled, as a short-term solution.
In the long term it will be fixed by HW.

[Open]
* This patchset proposes passing raw data from guest to host when
propagating the guest IOMMU TLB invalidation.

In fact, we have two choice here.

a) as proposed in this patchset, passing raw data to host. Host pIOMMU
   driver submits invalidation request after replacing specific fields.
   Reject if the IOMMU model is not correct.
   * Pros: no need to do parse and re-assembling, better performance
   * Cons: unable to support the scenarios which emulates an Intel IOMMU
           on an ARM platform.
b) parse the invalidation info into specific data, e.g. gran, addr,
   size, invalidation type etc. then fill the data in a generic
   structure. In host, pIOMMU driver re-assemble the invalidation
   request and submit to pIOMMU.
   * Pros: may be able to support the scenario above. But it is still in
           question since different vendor may have vendor specific
           invalidation info. This would make it difficult to have vendor
           agnostic invalidation propagation API.

   * Cons: needs additional complexity to do parse and re-assembling.
           The generic structure would be a hyper-set of all possible
           invalidate info, this may be hard to maintain in future.

As the pros/cons show, I proposed a) as an initial version. But it is an
open. I would be glad to hear from you.

FYI. The following definition is a draft discussed with Jean in previous
discussion. It has both generic part and vendor specific part.

struct tlb_invalidate_info
{
        __u32   model;  /* Vendor number */
        __u8 granularity
#define DEVICE_SELECTVIE_INV    (1 << 0)
#define PAGE_SELECTIVE_INV      (1 << 0)
#define PASID_SELECTIVE_INV     (1 << 1)
        __u32 pasid;
        __u64 addr;
        __u64 size;

        /* Since IOMMU format has already been validated for this table,
           the IOMMU driver knows that the following structure is in a
           format it knows */
        __u8 opaque[];
};

struct tlb_invalidate_info_intel
{
        __u32 inv_type;
        ...
        __u64 flags;
        ...
        __u8 mip;
        __u16 pfsid;
};

Additionally, Jean is proposing a para-vIOMMU solution. There is opaque
data in the proposed invalidate request VIRTIO_IOMMU_T_INVALIDATE. So it
may be preferred to have opaque part when doing the iommu tlb invalidate
propagation in SVM virtualization.

http://www.spinics.net/lists/kvm/msg147993.html

Best Wishes,
Yi L


Liu, Yi L (20):
  intel_iommu: add "ecs" option
  intel_iommu: exposed extended-context mode to guest
  intel_iommu: add "svm" option
  Memory: modify parameter in IOMMUNotifier func
  VFIO: add new IOCTL for svm bind tasks
  VFIO: add new notifier for binding PASID table
  VFIO: check notifier flag in region_del()
  Memory: add notifier flag check in memory_replay()
  Memory: introduce iommu_ops->record_device
  VFIO: notify vIOMMU emulator when device is assigned
  intel_iommu: provide iommu_ops->record_device
  Memory: Add func to fire pasidt_bind notifier
  IOMMU: add pasid_table_info for guest pasid table
  intel_iommu: add FOR_EACH_ASSIGN_DEVICE macro
  intel_iommu: link whole guest pasid table to host
  VFIO: Add notifier for propagating IOMMU TLB invalidate
  Memory: Add func to fire TLB invalidate notifier
  intel_iommu: propagate Extended-IOTLB invalidate to host
  intel_iommu: propagate PASID-Cache invalidate to host
  intel_iommu: propagate Ext-Device-TLB invalidate to host

 hw/i386/intel_iommu.c          | 543 +++++++++++++++++++++++++++++++++++++----
 hw/i386/intel_iommu_internal.h |  87 +++++++
 hw/vfio/common.c               |  45 +++-
 hw/vfio/pci.c                  |  94 ++++++-
 hw/virtio/vhost.c              |   3 +-
 include/exec/memory.h          |  45 +++-
 include/hw/i386/intel_iommu.h  |   5 +-
 include/hw/vfio/vfio-common.h  |   5 +
 linux-headers/linux/iommu.h    |  35 +++
 linux-headers/linux/vfio.h     |  26 ++
 memory.c                       |  59 +++++
 11 files changed, 882 insertions(+), 65 deletions(-)
 create mode 100644 linux-headers/linux/iommu.h

-- 
1.9.1




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux