On 11/13/20 2:59 AM, Muchun Song wrote: > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > new file mode 100644 > index 000000000000..a6c9948302e2 > --- /dev/null > +++ b/mm/hugetlb_vmemmap.c > @@ -0,0 +1,108 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Free some vmemmap pages of HugeTLB > + * > + * Copyright (c) 2020, Bytedance. All rights reserved. > + * > + * Author: Muchun Song <songmuchun@xxxxxxxxxxxxx> > + * Oscar has already made some suggestions to change comments. I would suggest changing the below text to something like the following. > + * Nowadays we track the status of physical page frames using struct page > + * structures arranged in one or more arrays. And here exists one-to-one > + * mapping between the physical page frame and the corresponding struct page > + * structure. > + * > + * The HugeTLB support is built on top of multiple page size support that > + * is provided by most modern architectures. For example, x86 CPUs normally > + * support 4K and 2M (1G if architecturally supported) page sizes. Every > + * HugeTLB has more than one struct page structure. The 2M HugeTLB has 512 > + * struct page structure and 1G HugeTLB has 4096 struct page structures. But > + * in the core of HugeTLB only uses the first 4 (Use of first 4 struct page > + * structures comes from HUGETLB_CGROUP_MIN_ORDER.) struct page structures to > + * store metadata associated with each HugeTLB. The rest of the struct page > + * structures are usually read the compound_head field which are all the same > + * value. If we can free some struct page memory to buddy system so that we > + * can save a lot of memory. > + * struct page structures (page structs) are used to describe a physical page frame. By default, there is a one-to-one mapping from a page frame to it's corresponding page struct. HugeTLB pages consist of multiple base page size pages and is supported by many architectures. See hugetlbpage.rst in the Documentation directory for more details. On the x86 architecture, HugeTLB pages of size 2MB and 1GB are currently supported. Since the base page size on x86 is 4KB, a 2MB HugeTLB page consists of 512 base pages and a 1GB HugeTLB page consists of 4096 base pages. For each base page, there is a corresponding page struct. Within the HugeTLB subsystem, only the first 4 page structs are used to contain unique information about a HugeTLB page. HUGETLB_CGROUP_MIN_ORDER provides this upper limit. The only 'useful' information in the remaining page structs is the compound_head field, and this field is the same for all tail pages. By removing redundant page structs for HugeTLB pages, memory can returned to the buddy allocator for other uses. > + * When the system boot up, every 2M HugeTLB has 512 struct page structures > + * which size is 8 pages(sizeof(struct page) * 512 / PAGE_SIZE). > + * > + * HugeTLB struct pages(8 pages) page frame(8 pages) > + * +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ > + * | | | 0 | -------------> | 0 | > + * | | | 1 | -------------> | 1 | > + * | | | 2 | -------------> | 2 | > + * | | | 3 | -------------> | 3 | > + * | | | 4 | -------------> | 4 | > + * | 2M | | 5 | -------------> | 5 | > + * | | | 6 | -------------> | 6 | > + * | | | 7 | -------------> | 7 | > + * | | +-----------+ +-----------+ > + * | | > + * | | > + * +-----------+ > + * > + * I think we want the description before the next diagram. Reworded description here: The value of compound_head is the same for all tail pages. The first page of page structs (page 0) associated with the HugeTLB page contains the 4 page structs necessary to describe the HugeTLB. The only use of the remaining pages of page structs (page 1 to page 7) is to point to compound_head. Therefore, we can remap pages 2 to 7 to page 1. Only 2 pages of page structs will be used for each HugeTLB page. This will allow us to free the remaining 6 pages to the buddy allocator. Here is how things look after remapping. > + * > + * HugeTLB struct pages(8 pages) page frame(8 pages) > + * +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+ > + * | | | 0 | -------------> | 0 | > + * | | | 1 | -------------> | 1 | > + * | | | 2 | -------------> +-----------+ > + * | | | 3 | -----------------^ ^ ^ ^ ^ > + * | | | 4 | -------------------+ | | | > + * | 2M | | 5 | ---------------------+ | | > + * | | | 6 | -----------------------+ | > + * | | | 7 | -------------------------+ > + * | | +-----------+ > + * | | > + * | | > + * +-----------+ -- Mike Kravetz