Re: [RFC PATCH v3 00/49] 1GB PUD THP support on x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




However, I don't follow how this is actually really feasible in big scale. You could only ever collapse into a 1GB THP if you happen to have 1GB consecutive 2MB THP / 4k already. Sounds to me like this happens when the stars align.

Both the process_madvise() approach and my proposal require page migration to bring back THPs, since like you said having consecutive pages ready is extremely rare. IIUC, the process_madvise() approach reuses khugepaged code to collapse huge pages,
namely first allocating a 2MB THP, then copying data over, finally free old base pages. My proposal would migrate pages within
a virtual address range (>1GB and 1GB-aligned) to get all physical pages contiguous, then promote the resulting 1GB consecutive
pages to 1GB THP. No new page allocation is needed.

I am missing how we can ever reliably form 1GB pages (esp. after the system ran for a while) without any kind of fragmentation avoidance / defragmentation mechanism that is aware of gigantic pages. For THP, pageblocks+compaction serve that purpose.


Both approaches would need user-space invocation, assuming either the application itself wants to get THPs for a specific region or a user-space daemon would do this for a group of application, instead of waiting for khugepaged to slowly (4096 pages every 10s) scan and do huge page collapse. User will pay the cost of getting THP. This also means THPs are not completely transparent to user, but I think it should be fine when users explicitly invoke these two methods to get THPs for better performance.

Here is the problem: these *advises* are not persistent. Assume your system has to swap and has to split the THP + write it to the swap backend. The gigantic page is lost for that part of the application. When loading the individual 4k pages out of swap there is no guarantee that we can form a 1 GB page again - and how should we know that the application wanted a 1 GB page at that position?

How would the application know that the advise was no dropped and that
a) There is no 1GB page anymore
b) It would have to re-issue the advise

Similarly, I am not convinced that the future of khugepaged is in user space.


The difference of my proposal is that it does not need a 1GB THP allocation, so there is no special requirements like using CMA
or increasing MAX_ORDER in buddy allocator to allow 1GB page allocation. It makes creating THPs with orders > MAX_ORDER possible
without other intrusive changes.

Anything that relies on large allocations succeeding purely because "ZONE_NORMAL memory is usually not fragmented after boot" is broken by design. That's why we have CMA, it can give guarantees (well, once we fix all remaining issues :) ).

--
Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux