On 2/4/21 3:50 AM, Muchun Song wrote: > Hi all, > [...] > When a HugeTLB is freed to the buddy system, we should allocate 6 pages for > vmemmap pages and restore the previous mapping relationship. > > Apart from 2MB HugeTLB page, we also have 1GB HugeTLB page. It is similar > to the 2MB HugeTLB page. We also can use this approach to free the vmemmap > pages. > > In this case, for the 1GB HugeTLB page, we can save 4094 pages. This is a > very substantial gain. On our server, run some SPDK/QEMU applications which > will use 1024GB hugetlbpage. With this feature enabled, we can save ~16GB > (1G hugepage)/~12GB (2MB hugepage) memory. > > Because there are vmemmap page tables reconstruction on the freeing/allocating > path, it increases some overhead. Here are some overhead analysis. [...] > Although the overhead has increased, the overhead is not significant. Like Mike > said, "However, remember that the majority of use cases create hugetlb pages at > or shortly after boot time and add them to the pool. So, additional overhead is > at pool creation time. There is no change to 'normal run time' operations of > getting a page from or returning a page to the pool (think page fault/unmap)". > Despite the overhead and in addition to the memory gains from this series ... there's an additional benefit there isn't talked here with your vmemmap page reuse trick. That is page (un)pinners will see an improvement and I presume because there are fewer memmap pages and thus the tail/head pages are staying in cache more often. Out of the box I saw (when comparing linux-next against linux-next + this series) with gup_test and pinning a 16G hugetlb file (with 1G pages): get_user_pages(): ~32k -> ~9k unpin_user_pages(): ~75k -> ~70k Usually any tight loop fetching compound_head(), or reading tail pages data (e.g. compound_head) benefit a lot. There's some unpinning inefficiencies I am fixing[0], but with that in added it shows even more: unpin_user_pages(): ~27k -> ~3.8k FWIW, I was also seeing that with devdax and the ZONE_DEVICE vmemmap page reuse equivalent series[1] but it was mixed with other numbers. Anyways, JFYI :) Joao [0] https://lore.kernel.org/linux-mm/20210204202500.26474-1-joao.m.martins@xxxxxxxxxx/ [1] https://lore.kernel.org/linux-mm/20201208172901.17384-1-joao.m.martins@xxxxxxxxxx/