> On Sep 9, 2020, at 7:27 AM, David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 09.09.20 15:14, Jason Gunthorpe wrote: >> On Wed, Sep 09, 2020 at 01:32:44PM +0100, Matthew Wilcox wrote: >> >>> But here's the thing ... we already allow >>> mmap(MAP_POPULATE | MAP_HUGETLB | MAP_HUGE_1GB) >>> >>> So if we're not doing THP, what's the point of this thread? >> >> I wondered that too.. >> >>> An madvise flag is a different beast; that's just letting the kernel >>> know what the app thinks its behaviour will be. The kernel can pay >> >> But madvise is too late, the VMA already has an address, if it is not >> 1G aligned it cannot be 1G THP already. > > That's why user space (like QEMU) is THP-aware and selects an address > that is aligned to the expected THP granularity (e.g., 2MB on x86_64). To me it's always seemed like there are two major divisions among THP use cases: 1) Applications that KNOW they would benefit from use of THPs, so they call madvise() with an appropriate parameter and explicitly inform the kernel of such 2) Applications that know nothing about THP but there may be an advantage that comes from "automatic" THP mapping when possible. This is an approach that I am more familiar with that comes down to: 1) Is a VMA properly aligned for a (whatever size) THP? 2) Is the mapping request for a length >= (whatever size) THP? 3) Let's try allocating memory to map the space using (whatever size) THP, and: -- If we succeed, great, awesome, let's do it. -- If not, no big deal, map using as large a page as we CAN get. There of course are myriad performance implications to this. Processes that start early after boot have a better chance of getting a THP, but that also means frequently mapped large memory spaces have a better chance of being mapped in a shared manner via a THP, e.g. libc, X servers or Firefox/Chrome. It also means that processes that would be mapped using THPs early in boot may not be if they should crash and need to be restarted. There are all sorts of tunables that would likely need to be in place to make the second approach more viable, but I think it's certainly worth investigating. The address selection you suggest is the basis of one of the patches I wrote for a previous iteration of THP support (and that is in Matthew's THP tree) that will try to round VM addresses to the proper alignment if possible so a THP can then be used to map the area.