Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup

Alex Williamson <alex.williamson@xxxxxxxxxx> · Thu, 4 Mar 2021 17:07:31 -0700

On Thu, 4 Mar 2021 19:16:33 -0400
Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:

> On Thu, Mar 04, 2021 at 02:37:57PM -0700, Alex Williamson wrote:
> 
> > Therefore unless a bus driver opts-out by replacing vm_private_data, we
> > can identify participating vmas by the vm_ops and have flags indicating
> > if the vma maps device memory such that vfio_get_device_from_vma()
> > should produce a device reference.  The vfio IOMMU backends would also
> > consume this, ie. if they get a valid vfio_device from the vma, use the
> > pfn_base field directly.  vfio_vm_ops would wrap the bus driver
> > callbacks and provide reference counting on open/close to release this
> > object.  
> 
> > I'm not thrilled with a vfio_device_ops callback plumbed through
> > vfio-core to do vma-to-pfn translation, so I thought this might be a
> > better alternative.  Thanks,  
> 
> Maybe you could explain why, because I'm looking at this idea and
> thinking it looks very complicated compared to a simple driver op
> callback?

vfio-core needs to export a vfio_vma_to_pfn() which I think assumes the
caller has already used vfio_device_get_from_vma(), but should still
validate the vma is one from a vfio device before calling this new
vfio_device_ops callback.  vfio-pci needs to validate the vm_pgoff
value falls within a BAR region, mask off the index and get the
pci_resource_start() for the BAR index.

Then we need a solution for how vfio_device_get_from_vma() determines
whether to grant a device reference for a given vma, where that vma may
map something other than device memory.  Are you imagining that we hand
out device references independently and vfio_vma_to_pfn() would return
an errno for vm_pgoff values that don't map device memory and the IOMMU
driver would release the reference?

It all seems rather ad-hoc.

> The implementation of such an op for vfio_pci is one line trivial, why
> do we need allocated memory and a entire shim layer instead? 
> 
> Shim layers are bad.

The only thing here I'd consider a shim layer is overriding vm_ops,
which just seemed like a cleaner and simpler solution than exporting
open/close functions and validating the bus driver installs them, and
the error path should they not.

> We still need a driver op of some kind because only the driver can
> convert a pg_off into a PFN. Remember the big point here is to remove
> the sketchy follow_pte()...

The bus driver simply writes the base_pfn value in the example
structure I outlined in its .mmap callback.  I'm just looking for an
alternative place to store our former vm_pgoff in a way that doesn't
prevent using unmmap_mapping_range().  The IOMMU backend, once it has a
vfio_device via vfio_device_get_from_vma() can know the format of
vm_private_data, cast it as a vfio_vma_private_data and directly use
base_pfn, accomplishing the big point.  They're all operating in the
agreed upon vm_private_data format.  Thanks,

Alex