On Mon, May 04, 2020 at 08:20:55AM -0600, Alex Williamson wrote: > On Fri, 1 May 2020 20:25:50 -0300 > Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > On Fri, May 01, 2020 at 03:39:19PM -0600, Alex Williamson wrote: > > > Rather than calling remap_pfn_range() when a region is mmap'd, setup > > > a vm_ops handler to support dynamic faulting of the range on access. > > > This allows us to manage a list of vmas actively mapping the area that > > > we can later use to invalidate those mappings. The open callback > > > invalidates the vma range so that all tracking is inserted in the > > > fault handler and removed in the close handler. > > > > > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > drivers/vfio/pci/vfio_pci.c | 76 ++++++++++++++++++++++++++++++++++- > > > drivers/vfio/pci/vfio_pci_private.h | 7 +++ > > > 2 files changed, 81 insertions(+), 2 deletions(-) > > > > > +static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) > > > +{ > > > + struct vm_area_struct *vma = vmf->vma; > > > + struct vfio_pci_device *vdev = vma->vm_private_data; > > > + > > > + if (vfio_pci_add_vma(vdev, vma)) > > > + return VM_FAULT_OOM; > > > + > > > + if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, > > > + vma->vm_end - vma->vm_start, vma->vm_page_prot)) > > > + return VM_FAULT_SIGBUS; > > > + > > > + return VM_FAULT_NOPAGE; > > > +} > > > + > > > +static const struct vm_operations_struct vfio_pci_mmap_ops = { > > > + .open = vfio_pci_mmap_open, > > > + .close = vfio_pci_mmap_close, > > > + .fault = vfio_pci_mmap_fault, > > > +}; > > > + > > > static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) > > > { > > > struct vfio_pci_device *vdev = device_data; > > > @@ -1357,8 +1421,14 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) > > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > > > vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; > > > > > > - return remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, > > > - req_len, vma->vm_page_prot); > > > + /* > > > + * See remap_pfn_range(), called from vfio_pci_fault() but we can't > > > + * change vm_flags within the fault handler. Set them now. > > > + */ > > > + vma->vm_flags |= VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP; > > > + vma->vm_ops = &vfio_pci_mmap_ops; > > > > Perhaps do the vfio_pci_add_vma & remap_pfn_range combo here if the > > BAR is activated ? That way a fully populated BAR is presented in the > > common case and avoids taking a fault path? > > > > But it does seem OK as is > > Thanks for reviewing. There's also an argument that we defer > remap_pfn_range() until the device is actually touched, which might > reduce the startup latency. But not startup to a functional VM as that will now have to take the slower fault path. > It's also a bit inconsistent with the vm_ops.open() path where I > can't return error, so I can't call vfio_pci_add_vma(), I can only > zap the vma so that the fault handler can return an error if > necessary. open could allocate memory so the zap isn't needed. If allocation fails then do the zap and take the slow path. > handler. If there's a good reason to do otherwise, I can make the > change, but I doubt I'd have encountered the dma mapping of an > unfaulted vma issue had I done it this way, so maybe there's a test > coverage argument as well. Thanks, This test is best done by having one thread disable the other bar while another thread is trying to map it Jason