On Fri, Oct 25, 2024 at 08:34:05AM +0000, Tian, Kevin wrote: > > From: Nicolin Chen <nicolinc@xxxxxxxxxx> > > Sent: Tuesday, October 22, 2024 8:19 AM > > > > This series introduces a new vIOMMU infrastructure and related ioctls. > > > > IOMMUFD has been using the HWPT infrastructure for all cases, including a > > nested IO page table support. Yet, there're limitations for an HWPT-based > > structure to support some advanced HW-accelerated features, such as > > CMDQV > > on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi- > > IOMMU > > environment, it is not straightforward for nested HWPTs to share the same > > parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a > > parent HWPT typically hold one stage-2 IO pagetable and tag it with only > > one ID in the cache entries. When sharing one large stage-2 IO pagetable > > across physical IOMMU instances, that one ID may not always be available > > across all the IOMMU instances. In other word, it's ideal for SW to have > > a different container for the stage-2 IO pagetable so it can hold another > > ID that's available. > > Just holding multiple IDs doesn't require a different container. This is > just a side effect when vIOMMU will be required for other said reasons. > > If we have to put more words here I'd prefer to adding a bit more for > CMDQV which is more compelling. not a big deal though. 😊 Ack. > > For this "different container", add vIOMMU, an additional layer to hold > > extra virtualization information: > > > > ________________________________________________________________ > > _______ > > | iommufd (with vIOMMU) | > > | | > > | [5] | > > | _____________ | > > | | | | > > | |----------------| vIOMMU | | > > | | | | | > > | | | | | > > | | [1] | | [4] [2] | > > | | ______ | | _____________ ________ | > > | | | | | [3] | | | | | | > > | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | > > | | |______| |_____________| |_____________| |________| | > > | | | | | | | > > > > |______|________|______________|__________________|_____________ > > __|_____| > > | | | | | > > ______v_____ | ______v_____ ______v_____ ___v__ > > | struct | | PFN | (paging) | | (nested) | |struct| > > |iommu_device| |------>|iommu_domain|<----|iommu_domain|<---- > > |device| > > |____________| storage|____________| |____________| |______| > > > > nit - [1] ... [5] can be removed. They are copied from the Documentation where numbers are needed. I will take all the numbers out in the cover-letters. > > The vIOMMU object should be seen as a slice of a physical IOMMU instance > > that is passed to or shared with a VM. That can be some HW/SW resources: > > - Security namespace for guest owned ID, e.g. guest-controlled cache tags > > - Access to a sharable nesting parent pagetable across physical IOMMUs > > - Virtualization of various platforms IDs, e.g. RIDs and others > > - Delivery of paravirtualized invalidation > > - Direct assigned invalidation queues > > - Direct assigned interrupts > > - Non-affiliated event reporting > > sorry no idea about 'non-affiliated event'. Can you elaborate? I'll put an "e.g.". > > On a multi-IOMMU system, the vIOMMU object must be instanced to the > > number > > of the physical IOMMUs that are passed to (via devices) a guest VM, while > > 'to the number of the physical IOMMUs that have a slice passed to ..." Ack. Thanks Nicolin