RE: [PATCH RFCv1 08/14] iommufd: Add IOMMU_VIOMMU_SET_DEV_ID ioctl

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Wed, 29 May 2024 02:58:11 +0000

> From: Nicolin Chen <nicolinc@xxxxxxxxxx>
> Sent: Wednesday, May 29, 2024 4:23 AM
> 
> On Mon, May 27, 2024 at 01:08:43AM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > > Sent: Friday, May 24, 2024 9:19 PM
> > >
> > > On Fri, May 24, 2024 at 07:13:23AM +0000, Tian, Kevin wrote:
> > > > I'm curious to learn the real reason of that design. Is it because you
> > > > want to do certain load-balance between viommu's or due to other
> > > > reasons in the kernel smmuv3 driver which e.g. cannot support a
> > > > viommu spanning multiple pSMMU?
> > >
> > > Yeah, there is no concept of support for a SMMUv3 instance where it's
> > > command Q's can only work on a subset of devices.
> > >
> > > My expectation was that VIOMMU would be 1:1 with physical iommu
> > > instances, I think AMD needs this too??
> > >
> >
> > Yes this part is clear now regarding to VCMDQ.
> >
> > But Nicoline said:
> >
> > "
> > One step back, even without VCMDQ feature, a multi-pSMMU setup
> > will have multiple viommus (with our latest design) being added
> > to a viommu list of a single vSMMU's. Yet, vSMMU in this case
> > always traps regular SMMU CMDQ, so it can do viommu selection
> > or even broadcast (if it has to).
> > "
> >
> > I don't think there is an arch limitation mandating that?
> 
> What I mean is for regular vSMMU. Without VCMDQ, a regular vSMMU
> on a multi-pSMMU setup will look like (e.g. three devices behind
> different SMMUs):
> |<------ VMM ------->|<------ kernel ------>|
>        |-- viommu0 --|-- pSMMU0 --|
> vSMMU--|-- viommu1 --|-- pSMMU1 --|--s2_hwpt
>        |-- viommu2 --|-- pSMMU2 --|
> 
> And device would attach to:
> |<---- guest ---->|<--- VMM --->|<- kernel ->|
>        |-- dev0 --|-- viommu0 --|-- pSMMU0 --|
> vSMMU--|-- dev1 --|-- viommu1 --|-- pSMMU1 --|
>        |-- dev2 --|-- viommu2 --|-- pSMMU2 --|
> 
> When trapping a device cache invalidation: it is straightforward
> by deciphering the virtual device ID to pick the viommu that the
> device is attached to.

I understand how above works.

My question is why that option is chosen instead of going with 1:1
mapping between vSMMU and viommu i.e. letting the kernel to
figure out which pSMMU should be sent an invalidation cmd to, as
how VT-d is virtualized.

I want to know whether doing so is simply to be compatible with
what VCMDQ requires, or due to another untold reason.

> 
> When doing iotlb invalidation, a command may or may not contain
> an ASID (a domain ID, and nested domain in this case):
> a) if a command doesn't have an ASID, VMM needs to broadcast the
>    command to all viommus (i.e. pSMMUs)
> b) if a command has an ASID, VMM needs to initially maintain an
>    S1 HWPT list by linking an ASID when adding an HWPT entry to
>    the list, by deciphering vSTE and its linked CD. Then it needs
>    to go through the S1 list with the ASID in the command, and to
>    find all corresponding HWPTs to issue/broadcast the command.
> 
> Thanks
> Nicolin