On Mon, Jun 10, 2024 at 05:44:16PM -0700, Nicolin Chen wrote: > On Mon, Jun 10, 2024 at 09:28:39PM -0300, Jason Gunthorpe wrote: > > On Mon, Jun 10, 2024 at 04:04:30PM -0700, Nicolin Chen wrote: > > > > > > > Actually, even now as we put a dispatcher in VMM, VMM still does > > > > > decode the CD table to link ASID to s1_hwpt. Otherwise, it could > > > > > only broadcast a TLBI cmd to all pSMMUs. > > > > > > > > No, there should be no CD table decoding and no linking ASID to > > > > anything by the VMM. > > > > > > > > The ARM architecture is clean, the ASID can remain private to the VM, > > > > there is no reason for the VMM to understand it. > > > > > > But a guest-level TLBI command usually has only ASID available to > > > know which pSMMU to dispatch the command. Without an ASID lookup > > > table, how could VMM then dispatch a command to the corresponding > > > pSMMU? > > > > It can broadcast. The ARM architecture does not expect a N:1 mapping > > of SMMUs. This is why I think it is not such a good idea.. > > Hmm, I thought we had an agreed idea that we shouldn't broadcast > a TLBI (except global NH_ALL/VAA) for invalidation performance? I wouldn't say agree, there are just lots of different trade offs to be made here. You can reduce broadcast by parsing the CD table from the VMM. You can reduce broadcast with multiple vSMMUs. VMM needs to pick a design. I favour multiple vSMMUs. > CD table walkthrough would be always done only by VMM, while the > lookup table could be created/maintained by the kernel. I feel a > vasid table could make sense since we maintain the vdev_id table > in the kernel space too. I'm not convinced we should put such a micro optimization in the kernel. If the VMM cares about performance then split the vSMMU, otherwise lets just put all the mess in the VMM and give it the tools to manage the invalidation distribution. In the long run things like vCMDQ are going to force the choice to multi-smmu, which is why I don't see too much value with investing in optimizing the single vSMMU case. The optimization can be done later if someone does have a use case. Jason