On Thu, May 23, 2024 at 08:49:03PM -0400, Peter Xu wrote: > Hi, Yan, > > On Fri, May 24, 2024 at 08:39:37AM +0800, Yan Zhao wrote: > > On Thu, May 23, 2024 at 01:56:27PM -0600, Alex Williamson wrote: > > > With the vfio device fd tied to the address space of the pseudo fs > > > inode, we can use the mm to track all vmas that might be mmap'ing > > > device BARs, which removes our vma_list and all the complicated lock > > > ordering necessary to manually zap each related vma. > > > > > > Note that we can no longer store the pfn in vm_pgoff if we want to use > > > unmap_mapping_range() to zap a selective portion of the device fd > > > corresponding to BAR mappings. > > > > > > This also converts our mmap fault handler to use vmf_insert_pfn() > > Looks vmf_insert_pfn() does not call memtype_reserve() to reserve memory type > > for the PFN on x86 as what's done in io_remap_pfn_range(). > > > > Instead, it just calls lookup_memtype() and determine the final prot based on > > the result from this lookup, which might not prevent others from reserving the > > PFN to other memory types. > > I didn't worry too much on others reserving the same pfn range, as that > should be the mmio region for this device, and this device should be owned > by vfio driver. > > However I share the same question, see: > > https://lore.kernel.org/r/20240523223745.395337-2-peterx@xxxxxxxxxx > > So far I think it's not a major issue as VFIO always use UC- mem type, and > that's also the default. But I do also feel like there's something we can Right, but I feel that it may lead to inconsistency in reserved mem type if VFIO (or the variant driver) opts to use WC for certain BAR as mem type in future. Not sure if it will be true though :) > do better, and I'll keep you copied too if I'll resend the series. Thanks. > > > > > > Does that matter? > > > because we no longer have a vma_list to avoid the concurrency problem > > > with io_remap_pfn_range(). The goal is to eventually use the vm_ops > > > huge_fault handler to avoid the additional faulting overhead, but > > > vmf_insert_pfn_{pmd,pud}() need to learn about pfnmaps first. > > > > > > Also, Jason notes that a race exists between unmap_mapping_range() and > > > the fops mmap callback if we were to call io_remap_pfn_range() to > > > populate the vma on mmap. Specifically, mmap_region() does call_mmap() > > > before it does vma_link_file() which gives a window where the vma is > > > populated but invisible to unmap_mapping_range(). > > > > > > > -- > Peter Xu >