On 09/08/23 10:39, Muchun Song wrote: > > > > On Sep 8, 2023, at 02:37, Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > > > On 09/06/23 12:26, Usama Arif wrote: > >> The new boot flow when it comes to initialization of gigantic pages > >> is as follows: > >> - At boot time, for a gigantic page during __alloc_bootmem_hugepage, > >> the region after the first struct page is marked as noinit. > >> - This results in only the first struct page to be > >> initialized in reserve_bootmem_region. As the tail struct pages are > >> not initialized at this point, there can be a significant saving > >> in boot time if HVO succeeds later on. > >> - Later on in the boot, the head page is prepped and the first > >> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages > >> are initialized. > >> - HVO is attempted. If it is not successful, then the rest of the > >> tail struct pages are initialized. If it is successful, no more > >> tail struct pages need to be initialized saving significant boot time. > >> > >> Signed-off-by: Usama Arif <usama.arif@xxxxxxxxxxxxx> > >> --- > >> mm/hugetlb.c | 61 +++++++++++++++++++++++++++++++++++++------- > >> mm/hugetlb_vmemmap.c | 2 +- > >> mm/hugetlb_vmemmap.h | 9 ++++--- > >> mm/internal.h | 3 +++ > >> mm/mm_init.c | 2 +- > >> 5 files changed, 62 insertions(+), 15 deletions(-) > > > > As mentioned, in general this looks good. One small point below. > > > >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c > >> index c32ca241df4b..540e0386514e 100644 > >> --- a/mm/hugetlb.c > >> +++ b/mm/hugetlb.c > >> @@ -3169,6 +3169,15 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) > >> } > >> > >> found: > >> + > >> + /* > >> + * Only initialize the head struct page in memmap_init_reserved_pages, > >> + * rest of the struct pages will be initialized by the HugeTLB subsystem itself. > >> + * The head struct page is used to get folio information by the HugeTLB > >> + * subsystem like zone id and node id. > >> + */ > >> + memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE), > >> + huge_page_size(h) - PAGE_SIZE); > >> /* Put them into a private list first because mem_map is not up yet */ > >> INIT_LIST_HEAD(&m->list); > >> list_add(&m->list, &huge_boot_pages); > >> @@ -3176,6 +3185,40 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) > >> return 1; > >> } > >> > >> +/* Initialize [start_page:end_page_number] tail struct pages of a hugepage */ > >> +static void __init hugetlb_folio_init_tail_vmemmap(struct folio *folio, > >> + unsigned long start_page_number, > >> + unsigned long end_page_number) > >> +{ > >> + enum zone_type zone = zone_idx(folio_zone(folio)); > >> + int nid = folio_nid(folio); > >> + unsigned long head_pfn = folio_pfn(folio); > >> + unsigned long pfn, end_pfn = head_pfn + end_page_number; > >> + > >> + for (pfn = head_pfn + start_page_number; pfn < end_pfn; pfn++) { > >> + struct page *page = pfn_to_page(pfn); > >> + > >> + __init_single_page(page, pfn, zone, nid); > >> + prep_compound_tail((struct page *)folio, pfn - head_pfn); > >> + set_page_count(page, 0); > >> + } > >> +} > >> + > >> +static void __init hugetlb_folio_init_vmemmap(struct folio *folio, struct hstate *h, > >> + unsigned long nr_pages) > >> +{ > >> + int ret; > >> + > >> + /* Prepare folio head */ > >> + __folio_clear_reserved(folio); > >> + __folio_set_head(folio); > >> + ret = page_ref_freeze(&folio->page, 1); > >> + VM_BUG_ON(!ret); > > > > In the current code, we print a warning and free the associated pages to > > buddy if we ever experience an increased ref count. The routine > > hugetlb_folio_init_tail_vmemmap does not check for this. > > > > I do not believe speculative/temporary ref counts this early in the boot > > process are possible. It would be great to get input from someone else. > > Yes, it is a very early stage and other tail struct pages haven't been > initialized yet, anyone should not reference them. It it the same case > as CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled. > > > > > When I wrote the existing code, it was fairly easy to WARN and continue > > if we encountered an increased ref count. Things would be bit more > > In your case, I think it is not in the boot process, right? They were calls in the same routine: gather_bootmem_prealloc(). > > complicated here. So, it may not be worth the effort. > > Agree. Note that tail struct pages are not initialized here, if we want to > handle head page, how to handle tail pages? It really cannot resolved. > We should make the same assumption as CONFIG_DEFERRED_STRUCT_PAGE_INIT > that anyone should not reference those pages. Agree that speculative refs should not happen this early. How about making the following changes? - Instead of set_page_count() in hugetlb_folio_init_tail_vmemmap, do a page_ref_freeze and VM_BUG_ON if not ref_count != 1. - In the commit message, mention 'The WARN_ON for increased ref count in gather_bootmem_prealloc was changed to a VM_BUG_ON. This is OK as there should be no speculative references this early in boot process. The VM_BUG_ON's are there just in case such code is introduced.' -- Mike Kravetz