> On Aug 31, 2023, at 17:58, Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Aug 31, 2023 at 02:21:06PM +0800, Muchun Song wrote: >> >> >>> On Aug 30, 2023, at 18:27, Usama Arif <usama.arif@xxxxxxxxxxxxx> wrote: >>> On 28/08/2023 12:33, Muchun Song wrote: >>>>> On Aug 25, 2023, at 19:18, Usama Arif <usama.arif@xxxxxxxxxxxxx> wrote: >>>>> >>>>> The new boot flow when it comes to initialization of gigantic pages >>>>> is as follows: >>>>> - At boot time, for a gigantic page during __alloc_bootmem_hugepage, >>>>> the region after the first struct page is marked as noinit. >>>>> - This results in only the first struct page to be >>>>> initialized in reserve_bootmem_region. As the tail struct pages are >>>>> not initialized at this point, there can be a significant saving >>>>> in boot time if HVO succeeds later on. >>>>> - Later on in the boot, HVO is attempted. If its successful, only the first >>>>> HUGETLB_VMEMMAP_RESERVE_SIZE / sizeof(struct page) - 1 tail struct pages >>>>> after the head struct page are initialized. If it is not successful, >>>>> then all of the tail struct pages are initialized. >>>>> >>>>> Signed-off-by: Usama Arif <usama.arif@xxxxxxxxxxxxx> >>>> This edition is simpler than before ever, thanks for your work. >>>> There is premise that other subsystems do not access vmemmap pages >>>> before the initialization of vmemmap pages associated withe HugeTLB >>>> pages allocated from bootmem for your optimization. However, IIUC, the >>>> compacting path could access arbitrary struct page when memory fails >>>> to be allocated via buddy allocator. So we should make sure that >>>> those struct pages are not referenced in this routine. And I know >>>> if CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, it will encounter >>>> the same issue, but I don't find any code to prevent this from >>>> happening. I need more time to confirm this, if someone already knows, >>>> please let me know, thanks. So I think HugeTLB should adopt the similar >>>> way to prevent this. >>>> Thanks. >>> >>> Thanks for the reviews. >>> >>> So if I understand it correctly, the uninitialized pages due to the optimization in this patch and due to DEFERRED_STRUCT_PAGE_INIT should be treated in the same way during compaction. I see that in isolate_freepages during compaction there is a check to see if PageBuddy flag is set and also there are calls like __pageblock_pfn_to_page to check if the pageblock is valid. >>> >>> But if the struct page is uninitialized then they would contain random data and these checks could pass if certain bits were set? >>> >>> Compaction is done on free list. I think the uninitialized struct pages atleast from DEFERRED_STRUCT_PAGE_INIT would be part of freelist, so I think their pfn would be considered for compaction. >>> >>> Could someone more familiar with DEFERRED_STRUCT_PAGE_INIT and compaction confirm how the uninitialized struct pages are handled when compaction happens? Thanks! >> >> Hi Mel, >> >> Could you help us answer this question? I think you must be the expert of >> CONFIG_DEFERRED_STRUCT_PAGE_INIT. I summarize the context here. As we all know, >> some struct pages are uninnitialized when CONFIG_DEFERRED_STRUCT_PAGE_INIT is >> enabled, if someone allocates a larger memory (e.g. order is 4) via buddy >> allocator and fails to allocate the memory, then we will go into the compacting >> routine, which will traverse all pfns and use pfn_to_page to access its struct >> page, however, those struct pages may be uninnitialized (so it's arbitrary data). >> Our question is how to prevent the compacting routine from accessing those >> uninitialized struct pages? We'll be appreciated if you know the answer. >> > > I didn't check the code but IIRC, the struct pages should be at least > valid and not contain arbitrary data once page_alloc_init_late finishes. However, the buddy allocator is ready before page_alloc_init_late(), so it may access arbitrary data in compacting routine, right? > > -- > Mel Gorman > SUSE Labs