On Mon, May 27, 2024 at 01:08:43AM +0000, Tian, Kevin wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Friday, May 24, 2024 9:19 PM > > > > On Fri, May 24, 2024 at 07:13:23AM +0000, Tian, Kevin wrote: > > > I'm curious to learn the real reason of that design. Is it because you > > > want to do certain load-balance between viommu's or due to other > > > reasons in the kernel smmuv3 driver which e.g. cannot support a > > > viommu spanning multiple pSMMU? > > > > Yeah, there is no concept of support for a SMMUv3 instance where it's > > command Q's can only work on a subset of devices. > > > > My expectation was that VIOMMU would be 1:1 with physical iommu > > instances, I think AMD needs this too?? > > > > Yes this part is clear now regarding to VCMDQ. > > But Nicoline said: > > " > One step back, even without VCMDQ feature, a multi-pSMMU setup > will have multiple viommus (with our latest design) being added > to a viommu list of a single vSMMU's. Yet, vSMMU in this case > always traps regular SMMU CMDQ, so it can do viommu selection > or even broadcast (if it has to). > " > > I don't think there is an arch limitation mandating that? What I mean is for regular a nested SMMU case. Without VCMDQ, a regular vSMMU on a multi-pSMMU setup will look like (e.g. three devices behind different SMMUs): |<---- guest ---->|<--------- VMM -------->|<- kernel ->| |-- dev0 --|-- viommu0 (s2_hwpt0) --|-- pSMMU0 --| vSMMU--|-- dev1 --|-- viommu1 (s2_hwpt0) --|-- pSMMU1 --| |-- dev2 --|-- viommu2 (s2_hwpt0) --|-- pSMMU2 --| When trapping a device cache invalidation: it is straightforward by deciphering the virtual device ID to pick the viommu that the device is attached to. So, only one pSMMU would receive the user invalidation request. When doing iotlb invalidation, a command may or may not contain an ASID (a domain ID, and nested domain in this case): a) if a command doesn't have an ASID, VMM needs to broadcast the command to all viommus (i.e. pSMMUs) b) if a command has an ASID, VMM needs to initially maintain an S1 HWPT list by linking an ASID when adding an HWPT entry to the list, by deciphering vSTE and its linked CD. Then it needs to go through the S1 list with the ASID in the command, and to find all corresponding HWPTs to issue/broadcast the command. Thanks Nicolin