Re: [PATCH v2 2/2] vfio/pci: Use unmap_mapping_range()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 04, 2024 at 08:07:28AM -0600, Alex Williamson wrote:
> On Tue, 4 Jun 2024 12:26:29 +0800
> Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> 
> > On Fri, May 31, 2024 at 11:18:15PM -0600, Alex Williamson wrote:
> > > On Sat, 1 Jun 2024 07:41:27 +0800
> > > Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote:
> > >   
> > > > On Wed, May 29, 2024 at 10:52:31PM -0600, Alex Williamson wrote:  
> > > > > With the vfio device fd tied to the address space of the pseudo fs
> > > > > inode, we can use the mm to track all vmas that might be mmap'ing
> > > > > device BARs, which removes our vma_list and all the complicated lock
> > > > > ordering necessary to manually zap each related vma.
> > > > > 
> > > > > Note that we can no longer store the pfn in vm_pgoff if we want to use
> > > > > unmap_mapping_range() to zap a selective portion of the device fd
> > > > > corresponding to BAR mappings.
> > > > > 
> > > > > This also converts our mmap fault handler to use vmf_insert_pfn()
> > > > > because we no longer have a vma_list to avoid the concurrency problem
> > > > > with io_remap_pfn_range().  The goal is to eventually use the vm_ops
> > > > > huge_fault handler to avoid the additional faulting overhead, but
> > > > > vmf_insert_pfn_{pmd,pud}() need to learn about pfnmaps first.
> > > > >    
> > > > Do we also consider looped vmf_insert_pfn() in mmap fault handler? e.g.
> > > > for (i = vma->vm_start; i < vma->vm_end; i += PAGE_SIZE) {
> > > > 	offset = (i - vma->vm_start) >> PAGE_SHIFT;
> > > > 	ret = vmf_insert_pfn(vma, i, base_pfn + offset);
> > > > 	if (ret != VM_FAULT_NOPAGE) {
> > > > 		zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
> > > > 		goto up_out;
> > > > 	}
> > > > }  
> > >  
> > What about the prefault version? e.g.
> > 
> >         ret = vmf_insert_pfn(vma, vmf->address, pfn + pgoff);
> > +       if (ret & VM_FAULT_ERROR)
> > +               goto out_disabled;
> > +
> > +       /* prefault */
> > +       for (i = vma->vm_start; i < vma->vm_end; i += PAGE_SIZE, pfn++) {
> > +               if (i == vmf->address)
> > +                       continue;
> > +
> > +               /* Don't return error on prefault */
> > +               if (vmf_insert_pfn(vma, i, pfn) & VM_FAULT_ERROR)
> > +                       break;
> > +       }
> > +
> >  out_disabled:
> >         up_read(&vdev->memory_lock);
> > 
> > 
> > > We can have concurrent faults, so doing this means that we need to
> > > continue to maintain a locked list of vmas that have faulted to avoid  
> > But looks vfio_pci_zap_bars() always zap full PCI BAR ranges for vmas in
> > core_vdev->inode->i_mapping.
> > So, it doesn't matter whether a vma is fully mapped or partly mapped?
> 
> Yes, but without locking concurrent faults would all be re-inserting
> the pfns concurrently from the fault handler.
insert_pfn() holds a ptl lock with get_locked_pte() and it will return
VM_FAULT_NOPAGE directly without re-inserting if !pte_none() is found,
right?

> 
> > > duplicate insertions and all the nasty lock gymnastics that go along
> > > with it.  I'd rather we just support huge_fault.  Thanks,  
> > huge_fault is better. But before that, is this prefault one good?
> > 
> 
> It seems like this would still be subject to the race that Jason noted
> here[1], ie. call_mmap() occurs before vma_link_file(), therefore we
> need to exclusively rely on fault to populate the vma or we risk a race
> with unmap_mapping_range().  Thanks,
> 
Thanks. But the proposed prefault way is also in the fault handler,
which is after the vma_link_file().
Similar implementations can be found at 
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c.

> 
> 
> [1]https://lore.kernel.org/all/20240522183046.GG20229@xxxxxxxxxx/
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux