> 2024年10月25日 01:06,Alex Williamson <alex.williamson@xxxxxxxxxx> 写道: > > On Thu, 24 Oct 2024 17:34:42 +0800 > Qinyun Tan <qinyuntan@xxxxxxxxxxxxxxxxx> wrote: > >> When user application call ioctl(VFIO_IOMMU_MAP_DMA) to map a dma address, >> the general handler 'vfio_pin_map_dma' attempts to pin the memory and >> then create the mapping in the iommu. >> >> However, some mappings aren't backed by a struct page, for example an >> mmap'd MMIO range for our own or another device. In this scenario, a vma >> with flag VM_IO | VM_PFNMAP, the pin operation will fail. Moreover, the >> pin operation incurs a large overhead which will result in a longer >> startup time for the VM. We don't actually need a pin in this scenario. >> >> To address this issue, we introduce a new DMA MAP flag >> 'VFIO_DMA_MAP_FLAG_MMIO_DONT_PIN' to skip the 'vfio_pin_pages_remote' >> operation in the DMA map process for mmio memory. Additionally, we add >> the 'VM_PGOFF_IS_PFN' flag for vfio_pci_mmap address, ensuring that we can >> directly obtain the pfn through vma->vm_pgoff. >> >> This approach allows us to avoid unnecessary memory pinning operations, >> which would otherwise introduce additional overhead during DMA mapping. >> >> In my tests, using vfio to pass through an 8-card AMD GPU which with a >> large bar size (128GB*8), the time mapping the 192GB*8 bar was reduced >> from about 50.79s to 1.57s. > > If the vma has a flag to indicate pfnmap, why does the user need to > provide a mapping flag to indicate not to pin? We generally cannot > trust such a user directive anyway, nor do we in this series, so it all > seems rather redundant. > > What about simply improving the batching of pfnmap ranges rather than > imposing any sort of mm or uapi changes? Or perhaps, since we're now > using huge_fault to populate the vma, maybe we can iterate at PMD or > PUD granularity rather than PAGE_SIZE? Seems like we have plenty of > optimizations to pursue that could be done transparently to the user. > Thanks, > > Alex