On Thu, 4 Mar 2021 19:16:33 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Thu, Mar 04, 2021 at 02:37:57PM -0700, Alex Williamson wrote: > > > Therefore unless a bus driver opts-out by replacing vm_private_data, we > > can identify participating vmas by the vm_ops and have flags indicating > > if the vma maps device memory such that vfio_get_device_from_vma() > > should produce a device reference. The vfio IOMMU backends would also > > consume this, ie. if they get a valid vfio_device from the vma, use the > > pfn_base field directly. vfio_vm_ops would wrap the bus driver > > callbacks and provide reference counting on open/close to release this > > object. > > > I'm not thrilled with a vfio_device_ops callback plumbed through > > vfio-core to do vma-to-pfn translation, so I thought this might be a > > better alternative. Thanks, > > Maybe you could explain why, because I'm looking at this idea and > thinking it looks very complicated compared to a simple driver op > callback? vfio-core needs to export a vfio_vma_to_pfn() which I think assumes the caller has already used vfio_device_get_from_vma(), but should still validate the vma is one from a vfio device before calling this new vfio_device_ops callback. vfio-pci needs to validate the vm_pgoff value falls within a BAR region, mask off the index and get the pci_resource_start() for the BAR index. Then we need a solution for how vfio_device_get_from_vma() determines whether to grant a device reference for a given vma, where that vma may map something other than device memory. Are you imagining that we hand out device references independently and vfio_vma_to_pfn() would return an errno for vm_pgoff values that don't map device memory and the IOMMU driver would release the reference? It all seems rather ad-hoc. > The implementation of such an op for vfio_pci is one line trivial, why > do we need allocated memory and a entire shim layer instead? > > Shim layers are bad. The only thing here I'd consider a shim layer is overriding vm_ops, which just seemed like a cleaner and simpler solution than exporting open/close functions and validating the bus driver installs them, and the error path should they not. > We still need a driver op of some kind because only the driver can > convert a pg_off into a PFN. Remember the big point here is to remove > the sketchy follow_pte()... The bus driver simply writes the base_pfn value in the example structure I outlined in its .mmap callback. I'm just looking for an alternative place to store our former vm_pgoff in a way that doesn't prevent using unmmap_mapping_range(). The IOMMU backend, once it has a vfio_device via vfio_device_get_from_vma() can know the format of vm_private_data, cast it as a vfio_vma_private_data and directly use base_pfn, accomplishing the big point. They're all operating in the agreed upon vm_private_data format. Thanks, Alex