> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Tuesday, May 14, 2024 11:56 PM > > On Sun, May 12, 2024 at 08:34:02PM -0700, Nicolin Chen wrote: > > On Sun, May 12, 2024 at 11:03:53AM -0300, Jason Gunthorpe wrote: > > > On Fri, Apr 12, 2024 at 08:47:01PM -0700, Nicolin Chen wrote: > > > > Add a new iommufd_viommu core structure to represent a vIOMMU > instance in > > > > the user space, typically backed by a HW-accelerated feature of an > IOMMU, > > > > e.g. NVIDIA CMDQ-Virtualization (an ARM SMMUv3 extension) and > AMD Hardware > > > > Accelerated Virtualized IOMMU (vIOMMU). > > > > > > I expect this will also be the only way to pass in an associated KVM, > > > userspace would supply the kvm when creating the viommu. > > > > > > The tricky bit of this flow is how to manage the S2. It is necessary > > > that the S2 be linked to the viommu: > > > > > > 1) ARM BTM requires the VMID to be shared with KVM > > > 2) AMD and others need the S2 translation because some of the HW > > > acceleration is done inside the guest address space > > > > > > I haven't looked closely at AMD but presumably the VIOMMU create will > > > have to install the S2 into a DID or something? > > > > > > So we need the S2 to exist before the VIOMMU is created, but the > > > drivers are going to need some more fixing before that will fully > > > work. Can you elaborate on this point? VIOMMU is a dummy container when it's created and the association to S2 comes relevant only until when VQUEUE is created inside and linked to a device? then there should be a window in between allowing the userspace to configure S2. Not saying against setting S2 up before vIOMMU creation. Just want to better understand the rationale here. > > > > > > Does the nesting domain create need the viommu as well (in place of > > > the S2 hwpt)? That feels sort of natural. > > > > Yes, I had a similar thought initially: each viommu is backed by > > a nested IOMMU HW, and a special HW accelerator like VCMDQ could > > be treated as an extension on top of that. It might not be very > > straightforward like the current design having vintf<->viommu and > > vcmdq <-> vqueue though... > > vqueue should be considered a sub object of the viommu and hold a > refcount on the viommu object for its lifetime. > > > In that case, we can then support viommu_cache_invalidate, which > > is quite natural for SMMUv3. Yet, I recall Kevin said that VT-d > > doesn't want or need that. > > Right, Intel currently doesn't need it, but I feel like everyone will > need this eventually as the fast invalidation path is quite important. > yes, there is no need but I don't see any harm of preparing for such extension on VT-d. Logically it's clearer, e.g. if we decide to move device TLB invalidation to a separate uAPI then vIOMMU is certainly a clearer object to carry it. and hardware extensions really looks like optimization on software implementations. and we do need make a decision now, given if we make vIOMMU as a generic object for all vendors it may have potential impact on the user page fault support which Baolu is working on. the so-called fault object will be contained in vIOMMU, which is software managed on VT-d/SMMU but passed through on AMD. And probably we don't need another handle mechanism in the attach path, suppose the vIOMMU object already contains necessary information to find out iommufd_object for a reported fault. Baolu, your thoughts?