On Thu, 25 Feb 2021 19:49:49 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Thu, Feb 25, 2021 at 03:21:13PM -0700, Alex Williamson wrote: > > > This is where it gets tricky. The vm_pgoff we get from > > file_operations.mmap is already essentially describing an offset from > > the base of a specific resource. We could convert that from an absolute > > offset to a pfn offset, but it's only the bus driver code (ex. > > vfio-pci) that knows how to get the base, assuming there is a single > > base per region (we can't assume enough bits per region to store > > absolute pfn). Also note that you're suggesting that all vfio mmaps > > would need to standardize on the vfio-pci implementation of region > > layouts. Not that most drivers haven't copied vfio-pci, but we've > > specifically avoided exposing it as a fixed uAPI such that we could have > > the flexibility for a bus driver to implement regions offsets however > > they need. > > Okay, well the bus driver owns the address space and the bus driver is > in control of the vm_pgoff. If it doesn't want to zap then it doesn't > need to do anything > > vfio-pci can consistently use the index encoding and be fine > > > So I'm not really sure what this looks like. Within vfio-pci we could > > keep the index bits in place to allow unmmap_mapping_range() to > > selectively zap matching vm_pgoffs but expanding that to a vfio > > standard such that the IOMMU backend can also extract a pfn looks very > > limiting, or ugly. Thanks, > > Lets add a op to convert a vma into a PFN range. The map code will > pass the vma to the op and get back a pfn (or failure). > > pci will switch the vm_pgoff to an index, find the bar base and > compute the pfn. > > It is starting to look more and more like dma buf though How terrible would it be if vfio-core used a shared vm_private_data structure and inserted itself into the vm_ops call chain to reference count that structure? We already wrap the device file descriptor via vfio_device_fops.mmap, so we could: struct vfio_vma_private_data *priv; priv = kzalloc(... priv->device = device; vma->vm_private_data = priv; device->ops->mmap(device->device_data, vma); if (vma->vm_private_data == priv) { priv->vm_ops = vma->vm_ops; vma->vm_ops = &vfio_vm_ops; } else kfree(priv); Where: struct vfio_vma_private_data { struct vfio_device *device; unsigned long pfn_base; void *private_data; // maybe not needed const struct vm_operations_struct *vm_ops; struct kref kref; unsigned int release_notification:1; }; Therefore unless a bus driver opts-out by replacing vm_private_data, we can identify participating vmas by the vm_ops and have flags indicating if the vma maps device memory such that vfio_get_device_from_vma() should produce a device reference. The vfio IOMMU backends would also consume this, ie. if they get a valid vfio_device from the vma, use the pfn_base field directly. vfio_vm_ops would wrap the bus driver callbacks and provide reference counting on open/close to release this object. I'm not thrilled with a vfio_device_ops callback plumbed through vfio-core to do vma-to-pfn translation, so I thought this might be a better alternative. Thanks, Alex