Hi Jason, Good to see your response. > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Friday, October 16, 2020 11:37 PM > > On Wed, Oct 14, 2020 at 03:16:22AM +0000, Tian, Kevin wrote: > > Hi, Alex and Jason (G), > > > > How about your opinion for this new proposal? For now looks both > > Jason (W) and Jean are OK with this direction and more discussions > > are possibly required for the new /dev/ioasid interface. Internally > > we're doing a quick prototype to see any unforeseen issue with this > > separation. > > Assuming VDPA and VFIO will be the only two users so duplicating > everything only twice sounds pretty restricting to me. > > > > Second, IOMMU nested translation is a per IOMMU domain > > > capability. Since IOMMU domains are managed by VFIO/VDPA > > > (alloc/free domain, attach/detach device, set/get domain attribute, > > > etc.), reporting/enabling the nesting capability is an natural > > > extension to the domain uAPI of existing passthrough frameworks. > > > Actually, VFIO already includes a nesting enable interface even > > > before this series. So it doesn't make sense to generalize this uAPI > > > out. > > The subsystem that obtains an IOMMU domain for a device would have to > register it with an open FD of the '/dev/sva'. That is the connection > between the two subsystems. It would be some simple kernel internal > stuff: > > sva = get_sva_from_file(fd); Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs which will finally program page table to host iommu driver. As far as I know, it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after it sets a proper iommu type to the opened container. VFIO container already stands for an iommu context with which userspace could program page mapping to host iommu. > sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain); So this is supposed to be called by VFIO/VDPA to register the info to /dev/sva. right? And in dev/sva, it will also maintain the device/iommu_domain and pasid info? will it be duplicated with VFIO/VDPA? > Not sure why this is a roadblock? > > How would this be any different from having some kernel libsva that > VDPA and VFIO would both rely on? > > You don't plan to just open code all this stuff in VFIO, do you? > > > > Then the tricky part comes with the remaining operations (3/4/5), > > > which are all backed by iommu_ops thus effective only within an > > > IOMMU domain. To generalize them, the first thing is to find a way > > > to associate the sva_FD (opened through generic /dev/sva) with an > > > IOMMU domain that is created by VFIO/VDPA. The second thing is > > > to replicate {domain<->device/subdevice} association in /dev/sva > > > path because some operations (e.g. page fault) is triggered/handled > > > per device/subdevice. Therefore, /dev/sva must provide both per- > > > domain and per-device uAPIs similar to what VFIO/VDPA already > > > does. > > Yes, the point here was to move the general APIs out of VFIO and into > a sharable location. So, of course one would expect some duplication > during the transition period. > > > > Moreover, mapping page fault to subdevice requires pre- > > > registering subdevice fault data to IOMMU layer when binding > > > guest page table, while such fault data can be only retrieved from > > > parent driver through VFIO/VDPA. > > Not sure what this means, page fault should be tied to the PASID, any > hookup needed for that should be done in-kernel when the device is > connected to the PASID. you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to software together with the requestor id of the device. For the page request injects to guest, it should have the device info. Regards, Yi Liu > > > > space but they may be organized in multiple IOMMU domains based > > > on their bus type. How (should we let) the userspace know the > > > domain information and open an sva_FD for each domain is the main > > > problem here. > > Why is one sva_FD per iommu domain required? The HW can attach the > same PASID to multiple iommu domains, right? > > > > In the end we just realized that doing such generalization doesn't > > > really lead to a clear design and instead requires tight coordination > > > between /dev/sva and VFIO/VDPA for almost every new uAPI > > > (especially about synchronization when the domain/device > > > association is changed or when the device/subdevice is being reset/ > > > drained). Finally it may become a usability burden to the userspace > > > on proper use of the two interfaces on the assigned device. > > If you have a list of things that needs to be done to attach a PCI > device to a PASID then of course they should be tidy kernel APIs > already, and not just hard wired into VFIO. > > The worst outcome would be to have VDPA and VFIO have to different > ways to do all of this with a different set of bugs. Bug fixes/new > features in VFIO won't flow over to VDPA. > > Jason