> From: Jason Gunthorpe > Sent: Tuesday, April 6, 2021 8:43 PM > > On Tue, Apr 06, 2021 at 09:35:17AM +0800, Jason Wang wrote: > > > > VFIO and VDPA has no buisness having map/unmap interfaces once we > have > > > /dev/ioasid. That all belongs in the iosaid side. > > > > > > I know they have those interfaces today, but that doesn't mean we have > > > to keep using them for PASID use cases, they should be replaced with a > > > 'do dma from this pasid on /dev/ioasid' interface certainly not a > > > 'here is a pasid from /dev/ioasid, go ahead and configure it youself' > > > interface > > > > So it looks like the PASID was bound to SVA in this design. I think it's not > > necessairly the case: > > No, I wish people would stop talking about SVA. > > SVA and vSVA are a very special narrow configuration of a PASID. There > are lots of other PASID configurations! That is the whole point, a > PASID is complicated, there are many configuration scenarios, they > need to be in one place with a very clearly defined uAPI > I feel it also makes sense to allow a subsystem to specify which configurations are permitted when allowing a PASID on its device, e.g. excluding things like GPA mappings that existing subsystems (VFIO/VDPA) already handle well: - Share GPA mappings between multiple devices (w/ or w/o PASID) for better IOTLB efficiency; - Share GPA mappings between transactions w/ PASID and transactions w/o PASID from the same device (e.g. GPU) for better IOTLB efficiency; - Use the same page table for GPA mappings before and after the guest turns on/off the PASID capability; All above are given as long as we continue to let VFIO/VDPA manage the iommu domain and associated GPA mappings for PASID. The IOMMU driver already ensures a nested PASID entry linking to the established GPA paging structure of the domain when the 1st-level pgtable is bound through /dev/ioasid. In contrast, above merits are lost if forcing a model where GPA mappings for PASID must be constructed through /dev/ioasid, as this will lead to multiple paging structures for the same GPA mappings implying worse IOTLB usage and unnecessary cost of invalidations. Therefore, I envision a scheme where the subsystem could specify permitted PASID configurations when doing ALLOW_PASID, and then userspace queries per-PASID capability to learn which operations are allowed, e.g.: 1) To enable vSVA, VFIO/VDPA allows pgtable binding and related invalidation/ fault ops through /dev/ioasid; 2) for vDPA control vq usage, no configuration is allowed through /dev/ioasid; 3) for new subsystem which doesn't carry any legacy or similar usage as VFIO/VDPA, it could permit all configurations through /dev/ioasid including 1st-level binding and 2nd-level mapping ops; This approach also allows us to grow the uAPI in a staging approach. Now focus on 1) and 2) as VFIO/VDPA are the only two users for now with good legacy to cover the GPA mappings. More ops can be introduced for 3) when there is a real example to show what exact ops are required for such a new subsystem. Is this a good strategy to move forward? btw this discussion was raised when discussing the I/O page fault handling process. Currently the IOMMU layer implements a per-device fault reporting mechanism, which requires VFIO to register a handler to receive all faults on its device and then forwards to ioasid if it's due to 1st-level. Possibly it makes more sense to convert it into a per-pgtable reporting scheme, and then the owner of each pgtable should register its own handler. It means for 1) VFIO will register a 2nd-level pgtable handler while /dev/ioasid will register a 1st-level pgtable handler, while for 3) /dev/ioasid will register handlers for both 1st-level and 2nd-level pgtable. Jean? also want to know your thoughts... Thanks Kevin