On Wed, Aug 14, 2024, Jason Gunthorpe wrote: > On Fri, Aug 09, 2024 at 12:08:50PM -0400, Peter Xu wrote: > > Overview > > ======== > > > > This series is based on mm-unstable, commit 98808d08fc0f of Aug 7th latest, > > plus dax 1g fix [1]. Note that this series should also apply if without > > the dax 1g fix series, but when without it, mprotect() will trigger similar > > errors otherwise on PUD mappings. > > > > This series implements huge pfnmaps support for mm in general. Huge pfnmap > > allows e.g. VM_PFNMAP vmas to map in either PMD or PUD levels, similar to > > what we do with dax / thp / hugetlb so far to benefit from TLB hits. Now > > we extend that idea to PFN mappings, e.g. PCI MMIO bars where it can grow > > as large as 8GB or even bigger. > > FWIW, I've started to hear people talk about needing this in the VFIO > context with VMs. > > vfio/iommufd will reassemble the contiguous range from the 4k PFNs to > setup the IOMMU, but KVM is not able to do it so reliably. Heh, KVM should very reliably do the exact opposite, i.e. KVM should never create a huge page unless the mapping is huge in the primary MMU. And that's very much by design, as KVM has no knowledge of what actually resides at a given PFN, and thus can't determine whether or not its safe to create a huge page if KVM happens to realize the VM has access to a contiguous range of memory.