Re: [PATCH RFCv1 08/14] iommufd: Add IOMMU_VIOMMU_SET_DEV_ID ioctl

Nicolin Chen <nicolinc@xxxxxxxxxx> · Mon, 10 Jun 2024 16:04:30 -0700

On Mon, Jun 10, 2024 at 07:01:10PM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 10, 2024 at 01:01:32PM -0700, Nicolin Chen wrote:
> > On Mon, Jun 10, 2024 at 09:04:46AM -0300, Jason Gunthorpe wrote:
> > > On Fri, Jun 07, 2024 at 02:19:21PM -0700, Nicolin Chen wrote:
> > > 
> > > > > IOTLB efficiency will suffer though when splitting 1p -> 2v while
> > > > > invalidation performance will suffer when joining 2p -> 1v.
> > > > 
> > > > I think the invalidation efficiency is actually solvable. So,
> > > > basically viommu_invalidate would receive a whole batch of cmds
> > > > and dispatch them to different pSMMUs (nested_domains/devices).
> > > > We already have a vdev_id table for devices, yet we just need a
> > > > new vasid table for nested_domains. Right?
> > > 
> > > You can't know the ASID usage of the hypervisor from the VM, unless
> > > you also inspect the CD table memory in the guest. That seems like
> > > something we should try hard to avoid.
> > 
> > Actually, even now as we put a dispatcher in VMM, VMM still does
> > decode the CD table to link ASID to s1_hwpt. Otherwise, it could
> > only broadcast a TLBI cmd to all pSMMUs.
> 
> No, there should be no CD table decoding and no linking ASID to
> anything by the VMM.
> 
> The ARM architecture is clean, the ASID can remain private to the VM,
> there is no reason for the VMM to understand it.

But a guest-level TLBI command usually has only ASID available to
know which pSMMU to dispatch the command. Without an ASID lookup
table, how could VMM then dispatch a command to the corresponding
pSMMU?

> The s1_hwpt is derived only from the vSTE and nothing more. It would
> be fine for all the devices to have their own s1_hwpts with their own
> vSTE's inside it.
> 
> > > > With that being said, it would make the kernel design a bit more
> > > > complicated. And the VMM still has to separate the commands for
> > > > passthrough devices (HW iotlb) from commands for emulated devices
> > > > (emulated iotlb), unless we further split the topology at the VM
> > > > level to have a dedicated vSMMU for all passthrough devices --
> > > > then VMM could just forward its entire cmdq to the kernel without
> > > > deciphering every command (likely?).
> > > 
> > > I would not include the emulated devices in a shared SMMU.. For the
> > > same reason, we should try hard to avoid inspecting the page table
> > > memory.
> > 
> > I wouldn't like the idea of attaching emulated devices to a shared
> > vSMMU. Yet, mind elaborating why this would inspect the page table
> > memory? Or do you mean we should avoid VMM inspecting user tables?
> 
> Emulated devices can't use the HW page table walker in the SMMU since
> they won't get a clean CD linkage they can use. They have to manually
> walk the page tables and convert them into an IOAS. It is a big PITA,
> best to be avoided.

Oh..the mainline QEMU smmu code already has that:
https://github.com/qemu/qemu/blob/master/hw/arm/smmu-common.c#L522

Surely, it's a pathway that passthrough devices shouldn't run into.

Thanks
Nicolin