> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > Sent: Tuesday, October 13, 2020 6:28 PM > > On Mon, Oct 12, 2020 at 08:38:54AM +0000, Tian, Kevin wrote: > > > From: Jason Wang <jasowang@xxxxxxxxxx> > > > Sent: Monday, September 14, 2020 12:20 PM > > > > > [...] > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > > specific one. > > > > > > Jason suggest something like /dev/sva. There will be a lot of other > > > subsystems that could benefit from this (e.g vDPA). > > > > > > Have you ever considered this approach? > > > > > > > Hi, Jason, > > > > We did some study on this approach and below is the output. It's a > > long writing but I didn't find a way to further abstract w/o losing > > necessary context. Sorry about that. > > > > Overall the real purpose of this series is to enable IOMMU nested > > translation capability with vSVA as one major usage, through > > below new uAPIs: > > 1) Report/enable IOMMU nested translation capability; > > 2) Allocate/free PASID; > > 3) Bind/unbind guest page table; > > 4) Invalidate IOMMU cache; > > 5) Handle IOMMU page request/response (not in this series); > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > the other two optional. For example, the guest may enable vSVA on > > a device without using PASID. Or, it may bind its gIOVA page table > > which doesn't require page fault support. Finally, all operations can > > be applied to either physical device or subdevice. > > > > Then we evaluated each uAPI whether generalizing it is a good thing > > both in concept and regarding to complexity. > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > allocation/free is through the IOASID sub-system. From this angle > > we feel generalizing PASID management does make some sense. > > First, PASID is just a number and not related to any device before > > it's bound to a page table and IOMMU domain. Second, PASID is a > > global resource (at least on Intel VT-d), while having separate VFIO/ > > VDPA allocation interfaces may easily cause confusion in userspace, > > e.g. which interface to be used if both VFIO/VDPA devices exist. > > Moreover, an unified interface allows centralized control over how > > many PASIDs are allowed per process. > > > > One unclear part with this generalization is about the permission. > > Do we open this interface to any process or only to those which > > have assigned devices? If the latter, what would be the mechanism > > to coordinate between this new interface and specific passthrough > > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean > > please correct me) plans to do per-device PASID namespace which > > is built on a bind_pasid_table iommu callback to allow guest fully > > manage its PASIDs on a given passthrough device. > > Yes we need a bind_pasid_table. The guest needs to allocate the PASID > tables because they are accessed via guest-physical addresses by the HW > SMMU. > > With bind_pasid_table, the invalidation message also requires a scope to > invalidate a whole PASID context, in addition to invalidating a mappings > ranges. > > > I'm not sure > > how such requirement can be unified w/o involving passthrough > > frameworks, or whether ARM could also switch to global PASID > > style... > > Not planned at the moment, sorry. It requires a PV IOMMU to do PASID > allocation, which is possible with virtio-iommu but not with a vSMMU > emulation. The VM will manage its own PASID space. The upside is that we > don't need userspace access to IOASID, so I won't pester you with comments > on that part of the API :) It makes sense. Possibly in the future when you plan to support SIOV-like capability then you may have to convert PASID table to use host physical address then the same API could be reused. :) Thanks Kevin > > > Second, IOMMU nested translation is a per IOMMU domain > > capability. Since IOMMU domains are managed by VFIO/VDPA > > (alloc/free domain, attach/detach device, set/get domain attribute, > > etc.), reporting/enabling the nesting capability is an natural > > extension to the domain uAPI of existing passthrough frameworks. > > Actually, VFIO already includes a nesting enable interface even > > before this series. So it doesn't make sense to generalize this uAPI > > out. > > Agree for enabling, but for reporting we did consider adding a sysfs > interface in /sys/class/iommu/ describing an IOMMU's properties. Then > opted for VFIO capabilities to keep the API nice and contained, but if > we're breaking up the API, sysfs might be more convenient to use and > extend. > > > Then the tricky part comes with the remaining operations (3/4/5), > > which are all backed by iommu_ops thus effective only within an > > IOMMU domain. To generalize them, the first thing is to find a way > > to associate the sva_FD (opened through generic /dev/sva) with an > > IOMMU domain that is created by VFIO/VDPA. The second thing is > > to replicate {domain<->device/subdevice} association in /dev/sva > > path because some operations (e.g. page fault) is triggered/handled > > per device/subdevice. Therefore, /dev/sva must provide both per- > > domain and per-device uAPIs similar to what VFIO/VDPA already > > does. Moreover, mapping page fault to subdevice requires pre- > > registering subdevice fault data to IOMMU layer when binding > > guest page table, while such fault data can be only retrieved from > > parent driver through VFIO/VDPA. > > > > However, we failed to find a good way even at the 1st step about > > domain association. The iommu domains are not exposed to the > > userspace, and there is no 1:1 mapping between domain and device. > > In VFIO, all devices within the same VFIO container share the address > > space but they may be organized in multiple IOMMU domains based > > on their bus type. How (should we let) the userspace know the > > domain information and open an sva_FD for each domain is the main > > problem here. > > > > In the end we just realized that doing such generalization doesn't > > really lead to a clear design and instead requires tight coordination > > between /dev/sva and VFIO/VDPA for almost every new uAPI > > (especially about synchronization when the domain/device > > association is changed or when the device/subdevice is being reset/ > > drained). Finally it may become a usability burden to the userspace > > on proper use of the two interfaces on the assigned device. > > > > Based on above analysis we feel that just generalizing PASID mgmt. > > might be a good thing to look at while the remaining operations are > > better being VFIO/VDPA specific uAPIs. anyway in concept those are > > just a subset of the page table management capabilities that an > > IOMMU domain affords. Since all other aspects of the IOMMU domain > > is managed by VFIO/VDPA already, continuing this path for new nesting > > capability sounds natural. There is another option by generalizing the > > entire IOMMU domain management (sort of the entire vfio_iommu_ > > type1), but it's unclear whether such intrusive change is worthwhile > > (especially when VFIO/VDPA already goes different route even in legacy > > mapping uAPI: map/unmap vs. IOTLB). > > I agree with your analysis. A new coarse /dev/sva interface would need to > carry all the VFIO abstractions of container (minus map/unmap) and > group+device, which are not necessarily needed by VDPA and others, while > the original VFIO interface needs to stay for compatibility. To me it > makes more sense to extend each API separately, but have them embed > common > structures (bind/inval) and share some resources through external > interfaces (IOASID, nesting properties, IOPF queue). > > Thanks, > Jean