LGTM and completely eliminates guest VM PCI initialization slowdowns on H100 and A100. Also not seeing any obvious regressions on my side. Reported-by: "Mitchell Augustin" <mitchell.augustin@xxxxxxxxxxxxx> Reviewed-by: "Mitchell Augustin" <mitchell.augustin@xxxxxxxxxxxxx> Tested-by: "Mitchell Augustin" <mitchell.augustin@xxxxxxxxxxxxx> On Wed, Feb 5, 2025 at 5:18 PM Alex Williamson <alex.williamson@xxxxxxxxxx> wrote: > > vfio-pci supports huge_fault for PCI MMIO BARs and will insert pud and > pmd mappings for well aligned mappings. follow_pfnmap_start() walks the > page table and therefore knows the page mask of the level where the > address is found and returns this through follow_pfnmap_args.pgmask. > Subsequent pfns from this address until the end of the mapping page are > necessarily consecutive. Use this information to retrieve a range of > pfnmap pfns in a single pass. > > With optimal mappings and alignment on systems with 1GB pud and 4KB > page size, this reduces iterations for DMA mapping PCI BARs by a > factor of 256K. In real world testing, the overhead of iterating > pfns for a VM DMA mapping a 32GB PCI BAR is reduced from ~1s to > sub-millisecond overhead. > > Signed-off-by: Alex Williamson <alex.williamson@xxxxxxxxxx> > --- > drivers/vfio/vfio_iommu_type1.c | 24 +++++++++++++++++------- > 1 file changed, 17 insertions(+), 7 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index 939920454da7..6f3e8d981311 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -520,7 +520,7 @@ static void vfio_batch_fini(struct vfio_batch *batch) > > static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, > unsigned long vaddr, unsigned long *pfn, > - bool write_fault) > + unsigned long *pgmask, bool write_fault) > { > struct follow_pfnmap_args args = { .vma = vma, .address = vaddr }; > int ret; > @@ -544,10 +544,12 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, > return ret; > } > > - if (write_fault && !args.writable) > + if (write_fault && !args.writable) { > ret = -EFAULT; > - else > + } else { > *pfn = args.pfn; > + *pgmask = args.pgmask; > + } > > follow_pfnmap_end(&args); > return ret; > @@ -590,15 +592,23 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr, > vma = vma_lookup(mm, vaddr); > > if (vma && vma->vm_flags & VM_PFNMAP) { > - ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE); > + unsigned long pgmask; > + > + ret = follow_fault_pfn(vma, mm, vaddr, pfn, &pgmask, > + prot & IOMMU_WRITE); > if (ret == -EAGAIN) > goto retry; > > if (!ret) { > - if (is_invalid_reserved_pfn(*pfn)) > - ret = 1; > - else > + if (is_invalid_reserved_pfn(*pfn)) { > + unsigned long epfn; > + > + epfn = (((*pfn << PAGE_SHIFT) + ~pgmask + 1) > + & pgmask) >> PAGE_SHIFT; > + ret = min_t(int, npages, epfn - *pfn); > + } else { > ret = -EFAULT; > + } > } > } > done: > -- > 2.47.1 > -- Mitchell Augustin Software Engineer - Ubuntu Partner Engineering