On Mon, 8 May 2023 13:48:30 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Mon, May 08, 2023 at 08:58:42PM +0800, Yan Zhao wrote: > > In VFIO type1, vaddr_get_pfns() will try fault in MMIO PFNs after > > pin_user_pages_remote() returns -EFAULT. > > > > follow_fault_pfn > > fixup_user_fault > > handle_mm_fault > > handle_mm_fault > > do_fault > > do_shared_fault > > do_fault > > __do_fault > > vfio_pci_mmap_fault > > io_remap_pfn_range > > remap_pfn_range > > track_pfn_remap > > vm_flags_set ==> mmap_assert_write_locked(vma->vm_mm) > > remap_pfn_range_notrack > > vm_flags_set ==> mmap_assert_write_locked(vma->vm_mm) > > > > As io_remap_pfn_range() will call vm_flags_set() to update vm_flags [1], > > holding of mmap write lock is required. > > So, update vfio_pci_mmap_fault() to drop mmap read lock and take mmap > > write lock. > > > > [1] https://lkml.kernel.org/r/20230126193752.297968-3-surenb@xxxxxxxxxx > > commit bc292ab00f6c ("mm: introduce vma->vm_flags wrapper functions") > > commit 1c71222e5f23 > > ("mm: replace vma->vm_flags direct modifications with modifier calls") > > > > Signed-off-by: Yan Zhao <yan.y.zhao@xxxxxxxxx> > > --- > > drivers/vfio/pci/vfio_pci_core.c | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c > > index a5ab416cf476..5082f89152b3 100644 > > --- a/drivers/vfio/pci/vfio_pci_core.c > > +++ b/drivers/vfio/pci/vfio_pci_core.c > > @@ -1687,6 +1687,12 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) > > struct vfio_pci_mmap_vma *mmap_vma; > > vm_fault_t ret = VM_FAULT_NOPAGE; > > > > + mmap_assert_locked(vma->vm_mm); > > + mmap_read_unlock(vma->vm_mm); > > + > > + if (mmap_write_lock_killable(vma->vm_mm)) > > + return VM_FAULT_RETRY; > > Certainly not.. > > I'm not sure how to resolve this properly, set the flags in advance? > > The address space conversion? We already try to set the flags in advance, but there are some architectural flags like VM_PAT that make that tricky. Cedric has been looking at inserting individual pages with vmf_insert_pfn(), but that incurs a lot more faults and therefore latency vs remapping the entire vma on fault. I'm not convinced that we shouldn't just attempt to remove the fault handler entirely, but I haven't tried it yet to know what gotchas are down that path. Thanks, Alex