On Wed, 29 May 2024 10:29:33 +0800 Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > On Tue, May 28, 2024 at 12:42:51PM -0600, Alex Williamson wrote: > > On Fri, 24 May 2024 09:47:03 +0800 > > Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: > > > > > On Thu, May 23, 2024 at 08:49:03PM -0400, Peter Xu wrote: > > > > Hi, Yan, > > > > > > > > On Fri, May 24, 2024 at 08:39:37AM +0800, Yan Zhao wrote: > > > > > On Thu, May 23, 2024 at 01:56:27PM -0600, Alex Williamson wrote: > > > > > > With the vfio device fd tied to the address space of the pseudo fs > > > > > > inode, we can use the mm to track all vmas that might be mmap'ing > > > > > > device BARs, which removes our vma_list and all the complicated lock > > > > > > ordering necessary to manually zap each related vma. > > > > > > > > > > > > Note that we can no longer store the pfn in vm_pgoff if we want to use > > > > > > unmap_mapping_range() to zap a selective portion of the device fd > > > > > > corresponding to BAR mappings. > > > > > > > > > > > > This also converts our mmap fault handler to use vmf_insert_pfn() > > > > > Looks vmf_insert_pfn() does not call memtype_reserve() to reserve memory type > > > > > for the PFN on x86 as what's done in io_remap_pfn_range(). > > > > > > > > > > Instead, it just calls lookup_memtype() and determine the final prot based on > > > > > the result from this lookup, which might not prevent others from reserving the > > > > > PFN to other memory types. > > > > > > > > I didn't worry too much on others reserving the same pfn range, as that > > > > should be the mmio region for this device, and this device should be owned > > > > by vfio driver. > > > > > > > > However I share the same question, see: > > > > > > > > https://lore.kernel.org/r/20240523223745.395337-2-peterx@xxxxxxxxxx > > > > > > > > So far I think it's not a major issue as VFIO always use UC- mem type, and > > > > that's also the default. But I do also feel like there's something we can > > > Right, but I feel that it may lead to inconsistency in reserved mem type if VFIO > > > (or the variant driver) opts to use WC for certain BAR as mem type in future. > > > Not sure if it will be true though :) > > > > Does Kevin's comment[1] satisfy your concern? vfio_pci_core_mmap() > > needs to make sure the PCI BAR region is requested before the mmap, > > which is tracked via the barmap. Therefore the barmap is always setup > > via pci_iomap() which will call memtype_reserve() with UC- attribute. > Just a question out of curiosity. > Is this a must to call pci_iomap() in vfio_pci_core_mmap()? > I don't see it or ioremap*() is called before nvgrace_gpu_mmap(). nvgrace-gpu is exposing a non-PCI coherent memory region as a BAR, so it doesn't request the PCI BAR region and is on it's own for read/write access as well. To mmap an actual PCI BAR it's required to request the region and vfio-pci-core uses the barmap to track which BARs have been requested. Thanks, Alex > > If there are any additional comments required to make this more clear > > or outline steps for WC support in the future, please provide > > suggestions. Thanks, > > > > Alex > > > > [1]https://lore.kernel.org/all/BN9PR11MB52764E958E6481A112649B5D8CF52@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > > > > > > Does that matter? > > > > > > because we no longer have a vma_list to avoid the concurrency problem > > > > > > with io_remap_pfn_range(). The goal is to eventually use the vm_ops > > > > > > huge_fault handler to avoid the additional faulting overhead, but > > > > > > vmf_insert_pfn_{pmd,pud}() need to learn about pfnmaps first. > > > > > > > > > > > > Also, Jason notes that a race exists between unmap_mapping_range() and > > > > > > the fops mmap callback if we were to call io_remap_pfn_range() to > > > > > > populate the vma on mmap. Specifically, mmap_region() does call_mmap() > > > > > > before it does vma_link_file() which gives a window where the vma is > > > > > > populated but invisible to unmap_mapping_range(). > > > > > > > > > > > > > > > > > > > -- > > > > Peter Xu > > > > > > > > > >