> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Tuesday, April 6, 2021 8:35 PM > > On Tue, Apr 06, 2021 at 01:27:15AM +0000, Tian, Kevin wrote: > > > > and here is one example why using existing VFIO/VDPA interface makes > > sense. say dev1 (w/ sva) and dev2 (w/o sva) are placed in a single VFIO > > container. > > Forget about SVA, it is an irrelevant detail of how a PASID is > configured. > > > The container is associated to an iommu domain which contains a > > single 2nd-level page table, shared by both devices (when attached > > to the domain). > > This level should be described by an ioasid. > > > The VFIO MAP operation is applied to the 2nd-level > > page table thus naturally applied to both devices. Then userspace > > could use /dev/ioasid to further allocate IOASIDs and bind multiple > > 1st-level page tables for dev1, nested on the shared 2nd-level page > > table. > > Because if you don't then we enter insane world where a PASID is being > created under /dev/ioasid but its translation path flows through setup > done by VFIO and the whole user API becomes an incomprehensible mess. > > How will you even associate the PASID with the other translation?? PASID is attached to a specific iommu domain (created by VFIO/VDPA), which has GPA->HPA mappings already configured. If we view that mapping as an attribute of the iommu domain, it's reasonable to have the userspace-bound pgtable through /dev/ioasid to nest on it. > > The entire translation path for any ioasid or PASID should be defined > only by /dev/ioasid. Everything else is a legacy API. > > > If following your suggestion then VFIO must deny VFIO MAP operations > > on sva1 (assume userspace should not mix sva1 and sva2 in the same > > container and instead use /dev/ioasid to map for sva1)? > > No, userspace creates an iosaid for the guest physical mapping and > passes this ioasid to VFIO PCI which will assign it as the first layer > mapping on the RID Is it an dummy ioasid just for providing GPA mappings for nesting purpose of other IOASIDs? Then we waste one per VM? > > When PASIDs are allocated the uAPI will be told to logically nested > under the first ioasid. When VFIO authorizes a PASID for a RID it > checks that all the HW rules are being followed. As I explained above, why cannot we just use iommu domain to connect the dots? Every passthrough framework needs to create an iommu domain first. and It needs to support both devices w/ PASID and devices w/o PASID. For devices w/o PASID it needs to invent its own MAP interface anyway. Then why do we bother creating another MAP interface through /dev/ioasid which not only duplicates but also creating transition burden between two set of MAP interfaces when the guest turns on/off the pasid capability on the device? > > If there are rules like groups of VFIO devices must always use the > same IOASID then VFIO will check these too (and realistically qemu > will have only one guest physical map ioasid anyhow) > > There is no real difference between setting up an IOMMU table for a > (RID,PASID) tuple or just a RID. We can do it universally with > one interface for all consumers. > 'universally' upon from which angle you look at this problem. From IOASID p.o.v possibly yes, but from device passthrough p.o.v. it's the opposite since the passthrough framework needs to handle devices w/o PASID anyway (or even for device w/ PASID it could send traffic w/o PASID) thus 'universally' makes more sense if the passthrough framework can use one interface of its own to manage GPA mappings for all consumers (apply to the case when a PASID is allowed/authorized). Thanks Kevin