On Tue, May 21, 2024 at 03:24:48PM -0300, Jason Gunthorpe wrote: > On Tue, May 14, 2024 at 06:59:07PM -0700, Nicolin Chen wrote: > > So, you want a proxy S1 domain for a device to attach, in case > > of a stage-2 only setup, because an S2 domain will no longer has > > a VMID, since it's shared among viommus. In the SMMU driver case, > > an arm_smmu_domain won't have an smmu pointer, so a device can't > > attach to an S2 domain but always an nested S1 domain, right? > > That seems like a simple solution to the VMID lifetime, but it means > the kernel has to decode more types of vSTE. Yea. For vSTE=abort, likely we need a nested block domain too? > > > Functionally we could use that global nesting domain > > > to deliver the DEV_INVALIDATE too. > > > > If my narrative above is correct, the device is actually still > > attached to S2 domain via a proxy nested S1 domain. What cache > > do we need to invalidate except S2 mappings in this case? > > qemu needs a reliable place to send the invalidation commands to (ie > what ID to provide). > > If we add IOMMU_VIOMMU_INVALIDATE then the ID is the viommu id. > > If we enhance IOMMU_HWPT_INVALIDATE then the ID is the identity > nesting domain. > > Either case leads to the viommu object in the kernel. Ooohh! I didn't connect the dots this far. Yes. This turns the IOMMU_HWPT_INVALIDATE back to the very first version supporting device cache flush. Though using IOMMU_HWPT_INVALIDATE feels a bit rule breaking now since it assumes the nested HWPT keeps a vdev_id lookup table somewhere in its associates... > I don't know if there is merit one way or the other. A more specific > API surface is nice, but the two APIs are completely duplicating. > > So maybe: > > #define IOMMU_VIOMMU_INVALIDATE IOMMU_HWPT_INVALIDATE > > As documentation and have the kernel just detect based on the type of > the passed ID? Yea, the only difference is viommu_id v.s. hwpt_id that we can document. Then in this case, we have two mostly identical uAPIs for the SMMU driver to use. Should we implement both? > > > > So again, yes, it makes sense to me that we move viommu and the > > > > set_dev_id to the nested series, and then drop DEV_INVALIDATE. > > > > > > I would like to do this bit by bit. viommu is a big series on its own. > > > > > > DEV_INVALIDATE is fine, it just can't do ATS invalidation. > > > > I am not very sure about AMD. > > AMD will need the same vRID -> pRID stuff and we want that to run on > the VIOMMU > > > Same question: any other case can we use the DEV_INVALIDATE for? > > DEV_INVALIDATE was interesting before the viommu idea because > userspace could process each invalidation command and when it reaches > ATS it would invoke the correct DEV_INVALIDATE. Agreed. That helped a lot in VMM dispatching the invalidation requests. > With viommu we expect ATS supporting drivers to support viommu and > then to do vRID -> pRID in the other invalidation paths. In this case > I don't see a reason to do DEV_INVALIDATE right now. Yea. I guessed so. > > > We can add ATS invalidation after either as an enhancement as part of > > > adding the VIOMMU either as DEV_INVALIDATE or VIOMMU_INVALIDATE (or > > > both) > > > > Yea, maybe step by step like this: > > > > Part-1 VIOMMU_ALLOC and VIOMMU_ATTACH > > Part-2 VIOMMU_SET/UNSET_VDEV_ID > > Part-3 VIOMMU_INVALIDATE > > Part-4 VQUEUE_ALLOC > > ... > > So we have this stuff still open: > - Identity STE with PASID (part 2b) > - IOMMU_GET_HW_INFO (part 3) > - IOMMU_HWPT_ALLOC_NEST_PARENT (part 3) > - IOMMU_HWPT_DATA_ARM_SMMUV3 (part 3) > - IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 > - VIOMMU_ALLOC, VIOMMU_ATTACH > - VIOMMU_INVALIDATE > - VIOMMU_SET/UNSET_VDEV_ID > - VQUEUE_ALLOC / vCMDQ > > I feel like IOMMU_HWPT_INVALIDATE_DATA_ARM_SMMUV3 is a reasonable fit > to part 3. Then part 4 would be VIOMMU_ALLOC -> VIOMMU_SET/UNSET_VDEV_ID > which brings ATS support the API. There is some conflict at passing in viommu_id/viommu v.s. parent hwpt_id/domain for a nested domain allocation. Do you think that should be addressed later in VIOMMU series v.s. part3? More specifically, I have two drafts in my viommu series: 87a659e65229 WAR: iommufd: Allow pt_it to carry viommu_id 7c5fd8f50bc9 WAR pass in viommu pointer to domain_alloc_user op I know that these two only make sense with VIOMMU_ALOC. Yet, will there be a problem, if we establish nested domain allocation with parent domain/hwpt by part3, in the uAPI, and then change later? Will we end up with supporting two for backward compatibility? > vCMDQ hypervisor support would sit on top of that with just VQUEUE? Yea. Thanks Nicolin