On Thu, Mar 04, 2021 at 05:07:31PM -0700, Alex Williamson wrote: > On Thu, 4 Mar 2021 19:16:33 -0400 > Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > On Thu, Mar 04, 2021 at 02:37:57PM -0700, Alex Williamson wrote: > > > > > Therefore unless a bus driver opts-out by replacing vm_private_data, we > > > can identify participating vmas by the vm_ops and have flags indicating > > > if the vma maps device memory such that vfio_get_device_from_vma() > > > should produce a device reference. The vfio IOMMU backends would also > > > consume this, ie. if they get a valid vfio_device from the vma, use the > > > pfn_base field directly. vfio_vm_ops would wrap the bus driver > > > callbacks and provide reference counting on open/close to release this > > > object. > > > > > I'm not thrilled with a vfio_device_ops callback plumbed through > > > vfio-core to do vma-to-pfn translation, so I thought this might be a > > > better alternative. Thanks, > > > > Maybe you could explain why, because I'm looking at this idea and > > thinking it looks very complicated compared to a simple driver op > > callback? > > vfio-core needs to export a vfio_vma_to_pfn() which I think assumes the > caller has already used vfio_device_get_from_vma(), but should still > validate the vma is one from a vfio device before calling this new > vfio_device_ops callback. Huh? Validate? Why? Something like this in the IOMMU stuff: struct vfio_device *vfio = vfio_device_get_from_vma(vma) if (!vfio->vma_to_pfn) return -EINVAL; vfio->ops->vma_to_pfn(vfio, vma, offset_from_vma_start) Is fine, why do we need to over complicate? I don't need to check that the vma belongs to the vfio because it is the API contract that the caller will guarantee that. This is the kernel, I can (and do!) check these sorts of things by code inspection when working on stuff - I can look at every implementation and every call site to prove these things. IMHO doing an expensive check like that is a style of defensive programming the kernel community frowns upon. > vfio-pci needs to validate the vm_pgoff value falls within a BAR > region, mask off the index and get the pci_resource_start() for the > BAR index. It needs to do the same thing fault() already does, which is currently one line.. > Then we need a solution for how vfio_device_get_from_vma() determines > whether to grant a device reference for a given vma, where that vma may > map something other than device memory. Are you imagining that we hand > out device references independently and vfio_vma_to_pfn() would return > an errno for vm_pgoff values that don't map device memory and the IOMMU > driver would release the reference? That seems a reasonable place to start > prevent using unmmap_mapping_range(). The IOMMU backend, once it has a > vfio_device via vfio_device_get_from_vma() can know the format of > vm_private_data, cast it as a vfio_vma_private_data and directly use > base_pfn, accomplishing the big point. They're all operating in the > agreed upon vm_private_data format. Thanks, If we force all drivers into a mandatory (!) vm_private_data format then every driver has to be audited and updated before the new pfn code can be done. If any driver in the future makes a mistake here (and omitting the unique vm_private_data magics is a very easy mistake to make) then it will cause a kernel crash in an obscure scenario. It is the "design the API to be hard to use wrong" philosophy. Jason