[LSF/MM/BPF TOPIC] 1GB PUD THP support (gigantic page allocation, increasing MAX_ORDER, anti-fragmentation and more)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have been working on 1GB THP support [1][2][3] and would like to have a discussion on the high-level design and some implementation details. The topics I would like to discuss related to 1GB PUD THP include:

1. Gigantic page allocation. Since MAX_ORDER is limiting us from allocating 1GB pages, we need to enable it via one or more ways, like using alloc_contig_range() or increasing MAX_ORDER.

2. The successful rate of allocating gigantic pages. Exiting anti-fragmentation mechanism works at pageblock level, which is 2MB on x86_64. What could be done to provide some guarantee on gigantic page allocation without being hurt by unmoveable page fragmentation? Increasing pageblock size, additional memory zone/region for gigantic pages, or something else.

3. How to expose 1GB PUD THP to user space. Allocating 1GB THP all the time at page fault is unrealistic and can waste a lot of memory and take a lot of page fault handling time. Would additional MADV_ flags to specify the THP page size be a good choice? Or do we want to introduce an additional API to ask kernel to create gigantic pages per user request[4]?

4. Code deduplication for THP handling and page table handling. When adding 1GB THP support, I needed to mechanically replicate PMD THP code for PUD THP, so I am thinking about possible code deduplication. One thing I did is to have a common split_huge_page_to_list_to_order() for both split_huge_page() and split_huge_pud_page()[5] for THP handling. On the other hand, I am also thinking about reviving Kirill’s idea[6] to consolidate page table manipulation API using page table level numbers like level=1,2,3,… instead of PTE, PMD, PUD, and so on.

There might be other THP-specific topics like how to handling PMD mappings to a 1GB PUD THP in addition to existing PTE mappings to a 2MB PMD THP, but I think we have plenty to discuss already and we can continue if we have time.


[1] https://lore.kernel.org/linux-mm/20200902180628.4052244-1-zi.yan@xxxxxxxx/
[2] https://lore.kernel.org/linux-mm/20200928175428.4110504-1-zi.yan@xxxxxxxx/
[3] https://lore.kernel.org/linux-mm/20210224223536.803765-1-zi.yan@xxxxxxxx/
[4] https://lore.kernel.org/linux-mm/20200907072014.GD30144@xxxxxxxxxxxxxx/
[5] https://lore.kernel.org/linux-mm/20201119160605.1272425-1-zi.yan@xxxxxxxx/
[6] https://lore.kernel.org/linux-mm/20180424154355.mfjgkf47kdp2by4e@xxxxxxxxxxxxxxxxxx/

—
Best Regards,
Yan Zi

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux