On Fri, Jun 07, 2024 at 11:49:17AM -0300, Jason Gunthorpe wrote: > On Fri, Jun 07, 2024 at 12:36:46AM +0000, Tian, Kevin wrote: > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > Sent: Friday, June 7, 2024 8:27 AM > > > > > > On Thu, Jun 06, 2024 at 11:44:58AM -0700, Nicolin Chen wrote: > > > > On Thu, Jun 06, 2024 at 03:24:23PM -0300, Jason Gunthorpe wrote: > > > > > On Sun, Jun 02, 2024 at 08:25:34PM -0700, Nicolin Chen wrote: > > > > > > > > > > > > I understand the appeal of doing this has been to minimize qemu > > > > > > > changes in its ACPI parts if we tackle that instead maybe we should > > > > > > > just not implement viommu to multiple piommu. It is somewhat > > > > > > > complicated. > > > > > > > > > > > > Would you please clarify that suggestion "not implement viommu > > > > > > to multiple piommu"? > > > > > > > > > > > > For regular nesting (SMMU), we are still doing one vSMMU in the > > > > > > VMM, though VCMDQ case would be an exception.... > > > > > > > > > > This is what I mean, always do multiple vSMMU if there are multiple > > > > > physical pSMMUs. Don't replicate any virtual commands across pSMMUs. > > > > > > > > Thanks for clarifying. That also means you'd prefer putting the > > > > command dispatcher in VMM, which is what we have at this moment. > > > > > > Unless someone knows a reason why we should strive hard to have only a > > > single vSMMU and accept some invalidation inefficiency? > > > > > > > migration? a single vSMMU provides better compatibility between > > src/dest... > > Maybe, though I think we can safely split a single pSMMU into multiple > vSMMUs using the IOMMUFD vIOMMU interface. So if your machine model > has two vSMMUs and your physical HW has less we can still make that > work. > > IOTLB efficiency will suffer though when splitting 1p -> 2v while > invalidation performance will suffer when joining 2p -> 1v. I think the invalidation efficiency is actually solvable. So, basically viommu_invalidate would receive a whole batch of cmds and dispatch them to different pSMMUs (nested_domains/devices). We already have a vdev_id table for devices, yet we just need a new vasid table for nested_domains. Right? The immediate benefit is that VMMs won't need to duplicate each other's dispatcher pieces, and likely helps migrations as Kevin pointed out. With that being said, it would make the kernel design a bit more complicated. And the VMM still has to separate the commands for passthrough devices (HW iotlb) from commands for emulated devices (emulated iotlb), unless we further split the topology at the VM level to have a dedicated vSMMU for all passthrough devices -- then VMM could just forward its entire cmdq to the kernel without deciphering every command (likely?). Thanks Nicolin