RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> Sent: Tuesday, October 13, 2020 6:28 PM
> 
> On Mon, Oct 12, 2020 at 08:38:54AM +0000, Tian, Kevin wrote:
> > > From: Jason Wang <jasowang@xxxxxxxxxx>
> > > Sent: Monday, September 14, 2020 12:20 PM
> > >
> > [...]
> >  > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > > specific one.
> > >
> > > Jason suggest something like /dev/sva. There will be a lot of other
> > > subsystems that could benefit from this (e.g vDPA).
> > >
> > > Have you ever considered this approach?
> > >
> >
> > Hi, Jason,
> >
> > We did some study on this approach and below is the output. It's a
> > long writing but I didn't find a way to further abstract w/o losing
> > necessary context. Sorry about that.
> >
> > Overall the real purpose of this series is to enable IOMMU nested
> > translation capability with vSVA as one major usage, through
> > below new uAPIs:
> > 	1) Report/enable IOMMU nested translation capability;
> > 	2) Allocate/free PASID;
> > 	3) Bind/unbind guest page table;
> > 	4) Invalidate IOMMU cache;
> > 	5) Handle IOMMU page request/response (not in this series);
> > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > the other two optional. For example, the guest may enable vSVA on
> > a device without using PASID. Or, it may bind its gIOVA page table
> > which doesn't require page fault support. Finally, all operations can
> > be applied to either physical device or subdevice.
> >
> > Then we evaluated each uAPI whether generalizing it is a good thing
> > both in concept and regarding to complexity.
> >
> > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > allocation/free is through the IOASID sub-system. From this angle
> > we feel generalizing PASID management does make some sense.
> > First, PASID is just a number and not related to any device before
> > it's bound to a page table and IOMMU domain. Second, PASID is a
> > global resource (at least on Intel VT-d), while having separate VFIO/
> > VDPA allocation interfaces may easily cause confusion in userspace,
> > e.g. which interface to be used if both VFIO/VDPA devices exist.
> > Moreover, an unified interface allows centralized control over how
> > many PASIDs are allowed per process.
> >
> > One unclear part with this generalization is about the permission.
> > Do we open this interface to any process or only to those which
> > have assigned devices? If the latter, what would be the mechanism
> > to coordinate between this new interface and specific passthrough
> > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean
> > please correct me) plans to do per-device PASID namespace which
> > is built on a bind_pasid_table iommu callback to allow guest fully
> > manage its PASIDs on a given passthrough device.
> 
> Yes we need a bind_pasid_table. The guest needs to allocate the PASID
> tables because they are accessed via guest-physical addresses by the HW
> SMMU.
> 
> With bind_pasid_table, the invalidation message also requires a scope to
> invalidate a whole PASID context, in addition to invalidating a mappings
> ranges.
> 
> > I'm not sure
> > how such requirement can be unified w/o involving passthrough
> > frameworks, or whether ARM could also switch to global PASID
> > style...
> 
> Not planned at the moment, sorry. It requires a PV IOMMU to do PASID
> allocation, which is possible with virtio-iommu but not with a vSMMU
> emulation. The VM will manage its own PASID space. The upside is that we
> don't need userspace access to IOASID, so I won't pester you with comments
> on that part of the API :)

It makes sense. Possibly in the future when you plan to support 
SIOV-like capability then you may have to convert PASID table
to use host physical address then the same API could be reused. :)

Thanks
Kevin

> 
> > Second, IOMMU nested translation is a per IOMMU domain
> > capability. Since IOMMU domains are managed by VFIO/VDPA
> >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > etc.), reporting/enabling the nesting capability is an natural
> > extension to the domain uAPI of existing passthrough frameworks.
> > Actually, VFIO already includes a nesting enable interface even
> > before this series. So it doesn't make sense to generalize this uAPI
> > out.
> 
> Agree for enabling, but for reporting we did consider adding a sysfs
> interface in /sys/class/iommu/ describing an IOMMU's properties. Then
> opted for VFIO capabilities to keep the API nice and contained, but if
> we're breaking up the API, sysfs might be more convenient to use and
> extend.
> 
> > Then the tricky part comes with the remaining operations (3/4/5),
> > which are all backed by iommu_ops thus effective only within an
> > IOMMU domain. To generalize them, the first thing is to find a way
> > to associate the sva_FD (opened through generic /dev/sva) with an
> > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > to replicate {domain<->device/subdevice} association in /dev/sva
> > path because some operations (e.g. page fault) is triggered/handled
> > per device/subdevice. Therefore, /dev/sva must provide both per-
> > domain and per-device uAPIs similar to what VFIO/VDPA already
> > does. Moreover, mapping page fault to subdevice requires pre-
> > registering subdevice fault data to IOMMU layer when binding
> > guest page table, while such fault data can be only retrieved from
> > parent driver through VFIO/VDPA.
> >
> > However, we failed to find a good way even at the 1st step about
> > domain association. The iommu domains are not exposed to the
> > userspace, and there is no 1:1 mapping between domain and device.
> > In VFIO, all devices within the same VFIO container share the address
> > space but they may be organized in multiple IOMMU domains based
> > on their bus type. How (should we let) the userspace know the
> > domain information and open an sva_FD for each domain is the main
> > problem here.
> >
> > In the end we just realized that doing such generalization doesn't
> > really lead to a clear design and instead requires tight coordination
> > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > (especially about synchronization when the domain/device
> > association is changed or when the device/subdevice is being reset/
> > drained). Finally it may become a usability burden to the userspace
> > on proper use of the two interfaces on the assigned device.
> >
> > Based on above analysis we feel that just generalizing PASID mgmt.
> > might be a good thing to look at while the remaining operations are
> > better being VFIO/VDPA specific uAPIs. anyway in concept those are
> > just a subset of the page table management capabilities that an
> > IOMMU domain affords. Since all other aspects of the IOMMU domain
> > is managed by VFIO/VDPA already, continuing this path for new nesting
> > capability sounds natural. There is another option by generalizing the
> > entire IOMMU domain management (sort of the entire vfio_iommu_
> > type1), but it's unclear whether such intrusive change is worthwhile
> > (especially when VFIO/VDPA already goes different route even in legacy
> > mapping uAPI: map/unmap vs. IOTLB).
> 
> I agree with your analysis. A new coarse /dev/sva interface would need to
> carry all the VFIO abstractions of container (minus map/unmap) and
> group+device, which are not necessarily needed by VDPA and others, while
> the original VFIO interface needs to stay for compatibility. To me it
> makes more sense to extend each API separately, but have them embed
> common
> structures (bind/inval) and share some resources through external
> interfaces (IOASID, nesting properties, IOPF queue).
> 
> Thanks,
> Jean




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux