> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Monday, October 19, 2020 10:25 PM > > On Mon, Oct 19, 2020 at 08:39:03AM +0000, Liu, Yi L wrote: > > Hi Jason, > > > > Good to see your response. > > Ah, I was away got it. :-) > > > > > Second, IOMMU nested translation is a per IOMMU domain > > > > > capability. Since IOMMU domains are managed by VFIO/VDPA > > > > > (alloc/free domain, attach/detach device, set/get domain > > > > > attribute, etc.), reporting/enabling the nesting capability is > > > > > an natural extension to the domain uAPI of existing passthrough > frameworks. > > > > > Actually, VFIO already includes a nesting enable interface even > > > > > before this series. So it doesn't make sense to generalize this > > > > > uAPI out. > > > > > > The subsystem that obtains an IOMMU domain for a device would have > > > to register it with an open FD of the '/dev/sva'. That is the > > > connection between the two subsystems. It would be some simple > > > kernel internal > > > stuff: > > > > > > sva = get_sva_from_file(fd); > > > > Is this fd provided by userspace? I suppose the /dev/sva has a set of > > uAPIs which will finally program page table to host iommu driver. As > > far as I know, it's weird for VFIO user. Why should VFIO user connect > > to a /dev/sva fd after it sets a proper iommu type to the opened > > container. VFIO container already stands for an iommu context with > > which userspace could program page mapping to host iommu. > > Again the point is to dis-aggregate the vIOMMU related stuff from VFIO so it > can > be shared between more subsystems that need it. I understand you here. :-) > I'm sure there will be some > weird overlaps because we can't delete any of the existing VFIO APIs, but > that > should not be a blocker. but the weird thing is what we should consider. And it perhaps not just overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO container is IOMMU context from the day it was defined. It could be the blocker. :-( > Having VFIO run in a mode where '/dev/sva' provides all the IOMMU handling is > a possible path. This looks to be similar with the proposal from Jason Wang and Kevin Tian. It is an idea to add "/dev/iommu" and delegate the IOMMU domain alloc, device attach/detach which is no in passthru framework to an independent kernel driver. Just as Jason Wang said replace vfio iommu type1 driver. Jason Wang: "And all the proposal in this series is to reuse the container fd. It should be possible to replace e.g type1 IOMMU with a unified module." link: https://lore.kernel.org/kvm/20201019142526.GJ6219@xxxxxxxxxx/T/#md49fe9ac9d9eff6ddf5b8c2ee2f27eb2766f66f3 Kevin Tian: "Based on above, I feel a more reasonable way is to first make a /dev/iommu uAPI supporting DMA map/unmap usages and then introduce vSVA to it. Doing this order is because DMA map/unmap is widely used thus can better help verify the core logic with many existing devices." link: https://lore.kernel.org/kvm/MWHPR11MB1645C702D148A2852B41FCA08C230@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > If your plan is to just opencode everything into VFIO then I don't > see how VDPA will work well, and if proper in-kernel abstractions are built I > fail to see how > routing some of it through userspace is a fundamental problem. I'm not expert on vDPA for now, but I saw you three open source veterans have a similar idea for a place to cover IOMMU handling, I think it may be a valuable thing to do. I said "may be" as I'm not sure about Alex's opinion on such idea. But the sure thing is this idea may introduce weird overlap even re-definition of existing thing as I replied above. We need to evaluate the impact and mature the idea step by step. That means it would take time, so perhaps we may do it in a staging way. First having a "/dev/iommu" be ready to handle page MAP/UNMAP which can be used by both VFIO and vDPA, mean- while let VFIO grow up (adding features) by itself and consider to adopt the new /dev/iommu later once /dev/iommu is competent. Of course this needs Alex's approval. And then adding new features to /dev/iommu, like SVA. > > > > sva_register_device_to_pasid(sva, pasid, pci_device, > > > iommu_domain); > > > > So this is supposed to be called by VFIO/VDPA to register the info to > > /dev/sva. > > right? And in dev/sva, it will also maintain the device/iommu_domain > > and pasid info? will it be duplicated with VFIO/VDPA? > > Each part needs to have the information it needs? yeah, but it's the duplication which I'm not very much in. Perhaps the idea from Jason Wang and Kevin would avoid such duplication. > > > > > Moreover, mapping page fault to subdevice requires pre- > > > > > registering subdevice fault data to IOMMU layer when binding > > > > > guest page table, while such fault data can be only retrieved > > > > > from parent driver through VFIO/VDPA. > > > > > > Not sure what this means, page fault should be tied to the PASID, > > > any hookup needed for that should be done in-kernel when the device > > > is connected to the PASID. > > > > you may refer to chapter 7.4.1.1 of VT-d spec. Page request is > > reported to software together with the requestor id of the device. For > > the page request injects to guest, it should have the device info. > > Whoever provides the vIOMMU emulation and relays the page fault to the guest > has to translate the RID - that's the point. But the device info (especially the sub-device info) is within the passthru framework (e.g. VFIO). So page fault reporting needs to go through passthru framework. > what does that have to do with VFIO? > > How will VPDA provide the vIOMMU emulation? a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor specification, right? you may correct me if I'm missing anything. > Jason Regards, Yi Liu