RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Wed, 7 Apr 2021 08:17:50 +0000

> From: Jason Gunthorpe
> Sent: Tuesday, April 6, 2021 8:43 PM
> 
> On Tue, Apr 06, 2021 at 09:35:17AM +0800, Jason Wang wrote:
> 
> > > VFIO and VDPA has no buisness having map/unmap interfaces once we
> have
> > > /dev/ioasid. That all belongs in the iosaid side.
> > >
> > > I know they have those interfaces today, but that doesn't mean we have
> > > to keep using them for PASID use cases, they should be replaced with a
> > > 'do dma from this pasid on /dev/ioasid' interface certainly not a
> > > 'here is a pasid from /dev/ioasid, go ahead and configure it youself'
> > > interface
> >
> > So it looks like the PASID was bound to SVA in this design. I think it's not
> > necessairly the case:
> 
> No, I wish people would stop talking about SVA.
> 
> SVA and vSVA are a very special narrow configuration of a PASID. There
> are lots of other PASID configurations! That is the whole point, a
> PASID is complicated, there are many configuration scenarios, they
> need to be in one place with a very clearly defined uAPI
> 

I feel it also makes sense to allow a subsystem to specify which configurations
are permitted when allowing a PASID on its device, e.g. excluding things like
GPA mappings that existing subsystems (VFIO/VDPA) already handle well:

- Share GPA mappings between multiple devices (w/ or w/o PASID) for 
better IOTLB efficiency;

- Share GPA mappings between transactions w/ PASID and transactions w/o
PASID from the same device (e.g. GPU) for better IOTLB efficiency;

- Use the same page table for GPA mappings before and after the guest 
turns on/off the PASID capability;

All above are given as long as we continue to let VFIO/VDPA manage the
iommu domain and associated GPA mappings for PASID. The IOMMU driver 
already ensures a nested PASID entry linking to the established GPA paging 
structure of the domain when the 1st-level pgtable is bound through 
/dev/ioasid. 

In contrast, above merits are lost if forcing a model where GPA mappings
for PASID must be constructed through /dev/ioasid, as this will lead to
multiple paging structures for the same GPA mappings implying worse 
IOTLB usage and unnecessary cost of invalidations.

Therefore, I envision a scheme where the subsystem could specify 
permitted PASID configurations when doing ALLOW_PASID, and then 
userspace queries per-PASID capability to learn which operations
are allowed, e.g.:

1) To enable vSVA, VFIO/VDPA allows pgtable binding and related invalidation/
fault ops through /dev/ioasid;

2) for vDPA control vq usage, no configuration is allowed through /dev/ioasid;

3) for new subsystem which doesn't carry any legacy or similar usage as 
VFIO/VDPA, it could permit all configurations through /dev/ioasid including 
1st-level binding and 2nd-level mapping ops;

This approach also allows us to grow the uAPI in a staging approach. Now 
focus on 1) and 2) as VFIO/VDPA are the only two users for now with good
legacy to cover the GPA mappings. More ops can be introduced for 3) when 
there is a real example to show what exact ops are required for such a new 
subsystem.

Is this a good strategy to move forward?

btw this discussion was raised when discussing the I/O page fault handling
process. Currently the IOMMU layer implements a per-device fault reporting
mechanism, which requires VFIO to register a handler to receive all faults 
on its device and then forwards to ioasid if it's due to 1st-level. Possibly it 
makes more sense to convert it into a per-pgtable reporting scheme, and 
then the owner of each pgtable should register its own handler. It means
for 1) VFIO will register a 2nd-level pgtable handler while /dev/ioasid
will register a 1st-level pgtable handler, while for 3) /dev/ioasid will register 
handlers for both 1st-level and 2nd-level pgtable. Jean? also want to know 
your thoughts...  

Thanks
Kevin