Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 14, 2020 at 04:33:10PM -0600, Alex Williamson wrote:

> Can you explain that further, or spit-ball what you think this /dev/sva
> interface looks like and how a user might interact between vfio and
> this new interface? 

When you open it you get some container, inside the container the
user can create PASIDs. PASIDs outside that container cannot be
reached.

Creating a PASID, or the guest PASID range would be the entry point
for doing all the operations against a PASID or range that this patch
series imagines:
 - Map process VA mappings to the PASID's DMA virtual address space
 - Catch faults
 - Setup any special HW stuff like Intel's two level thing, ARM stuff, etc
 - Expose resource controls, cgroup, whatever
 - Migration special stuff (allocate fixed PASIDs)

A PASID is a handle for an IOMMU page table, and the tools to
manipulate it. Within /dev/sva the page table is just 'floating' and
not linked to any PCI functions

The open /dev/sva FD holding the allocated PASIDs would be passed to a
kernel driver. This is a security authorization that the specified
PASID can be assigned to a PCI device by the kernel.

At this point the kernel driver would have the IOMMU permit its
bus/device/function to use the PASID. The PASID can be passed to
multiple drivers of any driver flavour so table re-use is
possible. Now the IOMMU page table is linked to a device.

The kernel device driver would also do the device specific programming
to setup the PASID in the device, attach it to some device object and
expose the device for user DMA.

For instance IDXD's char dev would map the queue memory and associate
the PASID with that queue and setup the HW to be ready for the new
enque instruction. The IDXD mdev would link to its emulated PCI BAR
and ensure the guest can only use PASID's included in the /dev/sva
container.

The qemu control plane for vIOMMU related to PASID would run over
/dev/sva.

I think the design could go further where a 'PASID' is just an
abstract idea of a page table, then vfio-pci could consume it too as a
IOMMU page table handle even though there is no actual PASID. So qemu
could end up with one API to universally control the vIOMMU, an API
that can be shared between subsystems and is not tied to VFIO.

> allocating pasids and associating them with page tables for that
> two-stage IOMMU setup, performing cache invalidations based on page
> table updates, etc.  How does it make more sense for a vIOMMU to
> setup some aspects of the IOMMU through vfio and others through a
> TBD interface?

vfio's IOMMU interface is about RID based full device ownership,
and fixed mappings.

PASID is about mediation, shared ownership and page faulting.

Does PASID overlap with the existing IOMMU RID interface beyond both
are using the IOMMU?

> The IOMMU needs to allocate PASIDs, so in that sense it enforces a
> quota via the architectural limits, but is the IOMMU layer going to
> distinguish in-kernel versus user limits?  A cgroup limit seems like a
> good idea, but that's not really at the IOMMU layer either and I don't
> see that a /dev/sva and vfio interface couldn't both support a cgroup
> type quota.

It is all good questions. PASID is new, this stuff needs to be
sketched out more. A lot of in-kernel users of IOMMU PASID are
probably going to be triggered by userspace actions.

I think a cgroup quota would end up near the IOMMU layer, so vfio,
sva, and any other driver char devs would all be restricted by the
cgroup as peers.

> And it's not clear that they'll have compatible requirements.  A
> userspace idxd driver might have limited needs versus a vIOMMU backend.
> Does a single quota model adequately support both or are we back to the
> differences between access to a device and ownership of a device?

At the end of the day a PASID is just a number and the drivers only
use of it is to program it into HW.

All these other differences deal with the IOMMU side of the PASID, how
pages are mapped into it, how page fault works, etc, etc. Keeping the
two concerns seperated seems very clean. A device driver shouldn't
care how the PASID is setup.

> > > This series is a blueprint within the context of the ownership and
> > > permission model that VFIO already provides.  It doesn't seem like we
> > > can pluck that out on its own, nor is it necessarily the case that VFIO
> > > wouldn't want to provide PASID services within its own API even if we
> > > did have this undefined /dev/sva interface.  
> > 
> > I don't see what you do - VFIO does not own PASID, and in this
> > vfio-mdev mode it does not own the PCI device/IOMMU either. So why
> > would this need to be part of the VFIO owernship and permission model?
> 
> Doesn't the PASID model essentially just augment the requester ID IOMMU
> model so as to manage the IOVAs for a subdevice of a RID?  

I'd say not really.. PASID is very different from RID because PASID
must always be mediated by the kernel. vfio-pci doesn't know how to
use PASID because it doesn't know how to program the PASID into
a specific device. While RID is fully self contained with vfio-pci.

Further, with the SVA models, the mediated devices are highly likely
to be shared between a vfio-mdev and a normal driver, as IDXD
shows. Userspace will get PASID's for SVA and share the device equally
with vfio-mdev.

> What elevates a user to be able to allocate such resources in this
> new proposal?

AFAIK the target for the current SVA model is no limitation. User
processes can open their devices, establish SVA and go ahead with
their workload.

If you are asking about iommu groups.. For PASID the PCI
bus/device/function that is the 'control point' for PASID must be
secure and owned by the kernel. ie only the kernel can progam the
device to use a given PASID. P2P access from other devices under
non-kernel control must not be allowed, as they could program a device
to use a PASID the kernel would not authorize.

All of this has to be done regardless of VFIO's involvement..

> Do they need a device at all?  It's not clear to me why RID based
> IOMMU management fits within vfio's scope, but PASID based does not.

In RID mode vfio-pci completely owns the PCI function, so it is more
natural that VFIO, as the sole device owner, would own the DMA mapping
machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
so there is not much reason to try and disaggregate the API.

PASID on the other hand, is shared. vfio-mdev drivers will share the
device with other kernel drivers. PASID and DMA will be concurrent
with VFIO and other kernel drivers/etc.

Thus it makes more sense here to have the control plane for PASID also
be shared and not tied exclusively to VFIO.

Jason



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux