Re: [PATCH v6 0/6] iommufd: Add nesting infrastructure (part 2/2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 12/9/2023 8:47 AM, Jason Gunthorpe wrote:
On Fri, Nov 17, 2023 at 05:07:11AM -0800, Yi Liu wrote:

Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.

I've been looking at what the three HW's need for invalidation, it is
a bit messy.. Here is my thinking. Please let me know if I got it right

What is the starting point of the guest memory walks:
  Intel: Single Scalable Mode PASID table entry indexed by a RID & PASID
  AMD: GCR3 table (a table of PASIDs) indexed by RID

GCR3 table is indexed by PASID.
Device Table (DTE) is indexted by DeviceID (RID)

...
Will ATC be forwarded or synthesized:
  Intel: The (vDomain-ID,PASID) is a unique nesting domain so
         the hypervisor knows exactly which RIDs this nesting domain is
	linked to and can generate an ATC invalidation. Plan is to
	supress/discard the ATC invalidations from the VM and generate
	them in the hypervisor.
  AMD: (vDomain-ID,PASID) is ambiguous, it can refer to multiple GCR3
       tables. We know which maximal set of RIDs it represents, but not
       the actual set. I expect AMD will forward the ATC invalidation
       to avoid over invalidation.

Not sure I understand your description here.

For the AMD IOMMU INVALIDE_IOMMU_PAGES (i.e. invalidate the IOMMU TLB), the hypervisor needs to map gDomainId->hDomainId and issue the command on behalf of the VM along with the PASID and GVA (or GVA range) provided by the guest.

For the AMD IOMMU INVALIDE_IOTLB_PAGES (i.e. invalidate the ATC on the device), the hypervisor needs to map gDeviceId->hDeviceId and issue the command on behalf of the VM along with the PASID and GVA (or GVA range) provided by the guest.

  ARM: ASID is ambiguous. We have no idea which Nesting Domain/CD table
       the ASID is contained in. ARM must forward the ATC invalidation
       from the guest.

What iommufd object should receive the IOTLB invalidation command list:
  Intel: The Nesting domain. The command list has to be broken up per
         (vDomain-ID,PASID) and that batch delivered to the single
	nesting domain. Kernel ignores vDomain-ID/PASID and just
	invalidates whatever the nesting domain is actually attached to
  AMD: Any Nesting Domain in the vDomain-ID group. The command list has
       to be broken up per (vDomain-ID). Kernel replaces
       vDomain-ID with pDomain-ID from the nesting domain and executes
       the invalidation.
  ARM: The Nesting Parent domain. Kernel forces the VMID from the
       Nesting Parent and executes the invalidation.

In all cases the VM issues an ATC invalidation with (vRID, PASID) as
the tag. The VMM must translate vRID -> dev_id -> pRID

For a pure SW flow the vRID can be mapped to the dev_id and the ATC
invalidation delivered to the device object (eg IOMMUFD_DEV_INVALIDATE)

Finally, we have the HW driven invalidation DMA queues that can be
directly assigned to the guest. AMD and SMMUv3+vCMDQ support this. In
this case the HW is directly processing invalidation commands without
a hypervisor trap.

To make this work the iommu needs to be programmed with:
  AMD: A vDomain-ID -> pDomain-ID table
       A vRID -> pRID table
       This is all bound to some "virtual function"

By "virtual function", I assume you are referring to the AMD vIOMMU instance in the guest?

  ARM: A vRID -> pRID table
       The vCMDQ is bound to a VM_ID, so to the Nesting Parent

For AMD, as above, I suggest the vDomain-ID be passed when creating
the nesting domain
Sure, we can do this part.

The AMD "virtual function".. It is probably best to create a new iommufd
object for this and it can be passed in to a few places

Something like IOMMUFD_OBJ_VIOMMU? Then operation would include something like:
  * Init
  * Destroy
  * ...

The vRID->pRID table should be some mostly common
IOMMUFD_DEV_ASSIGN_VIRTUAL_ID. AMD will need to pass in the virtual
function ID and ARM will need to pass in the Nesting Parent ID.

Ok.

...
Thus next steps:
  - Respin this and lets focus on Intel only (this will be tough for
    the holidays, but if it is available I will try)
  - Get an ARM patch that just does IOTLB invalidation and add it to my
    part 3
  - Start working on IOMMUFD_DEV_INVALIDATE along with an ARM
    implementation of it
  - Reorganize the AMD RFC broadly along these lines and lets see it
    freshened up in the next months as well. I would like to see the
    AMD support structured to implement the SW paths in first steps and
    later add in the "virtual function" acceleration stuff. The latter
    is going to be complex.

Working on refining the part 1 to add HW info reporting and nested translation (minus the invalidation stuff). Should be sending out soon.

Suravee




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux