On 9 Dec 2024, at 16:35, David Hildenbrand wrote: > On 09.12.24 20:23, Zi Yan wrote: >> On 9 Dec 2024, at 14:01, Vlastimil Babka wrote: >> >>> On 12/6/24 10:59, David Hildenbrand wrote: >>>> Let's special-case for the common scenarios that: >>>> >>>> (a) We are freeing pages <= pageblock_order >>>> (b) We are freeing a page <= MAX_PAGE_ORDER and all pageblocks match >>>> (especially, no mixture of isolated and non-isolated pageblocks) >>> >>> Well in many of those cases we could also just adjust the pageblocks... But >>> perhaps they indeed shouldn't differ in the first place, unless there's an >>> isolation attempt. >>> >>>> When we encounter a > MAX_PAGE_ORDER page, it can only come from >>>> alloc_contig_range(), and we can process MAX_PAGE_ORDER chunks. >>>> >>>> When we encounter a >pageblock_order <= MAX_PAGE_ORDER page, >>>> check whether all pageblocks match, and if so (common case), don't >>>> split them up just for the buddy to merge them back. >>>> >>>> This makes sure that when we free MAX_PAGE_ORDER chunks to the buddy, >>>> for example during system startups, memory onlining, or when isolating >>>> consecutive pageblocks via alloc_contig_range()/memory offlining, that >>>> we don't unnecessarily split up what we'll immediately merge again, >>>> because the migratetypes match. >>>> >>>> Rename split_large_buddy() to __free_one_page_maybe_split(), to make it >>>> clearer what's happening, and handle in it only natural buddy orders, >>>> not the alloc_contig_range(__GFP_COMP) special case: handle that in >>>> free_one_page() only. >>>> >>>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> >>> >>> Acked-by: Vlastimil Babka <vbabka@xxxxxxx >>> >>> Hm but noticed something: >>> >>>> +static void __free_one_page_maybe_split(struct zone *zone, struct page *page, >>>> + unsigned long pfn, int order, fpi_t fpi_flags) >>>> +{ >>>> + const unsigned long end_pfn = pfn + (1 << order); >>>> + int mt = get_pfnblock_migratetype(page, pfn); >>>> + >>>> + VM_WARN_ON_ONCE(order > MAX_PAGE_ORDER); >>>> VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order)); >>>> /* Caller removed page from freelist, buddy info cleared! */ >>>> VM_WARN_ON_ONCE(PageBuddy(page)); >>>> >>>> - if (order > pageblock_order) >>>> - order = pageblock_order; >>>> - >>>> - while (pfn != end) { >>>> - int mt = get_pfnblock_migratetype(page, pfn); >>>> + /* >>>> + * With CONFIG_MEMORY_ISOLATION, we might be freeing MAX_ORDER_NR_PAGES >>>> + * pages that cover pageblocks with different migratetypes; for example >>>> + * only some migratetypes might be MIGRATE_ISOLATE. In that (unlikely) >>>> + * case, fallback to freeing individual pageblocks so they get put >>>> + * onto the right lists. >>>> + */ >>>> + if (!IS_ENABLED(CONFIG_MEMORY_ISOLATION) || >>>> + likely(order <= pageblock_order) || >>>> + pfnblock_migratetype_equal(pfn + pageblock_nr_pages, end_pfn, mt)) { >>>> + __free_one_page(page, pfn, zone, order, mt, fpi_flags); >>>> + return; >>>> + } >>>> >>>> - __free_one_page(page, pfn, zone, order, mt, fpi); >>>> - pfn += 1 << order; >>>> + while (pfn != end_pfn) { >>>> + mt = get_pfnblock_migratetype(page, pfn); >>>> + __free_one_page(page, pfn, zone, pageblock_order, mt, fpi_flags); >>>> + pfn += pageblock_nr_pages; >>>> page = pfn_to_page(pfn); >>> >>> This predates your patch, but seems potentially dangerous to attempt >>> pfn_to_page(end_pfn) with SPARSEMEM and no vmemmap and the end_pfn perhaps >>> being just outside of the valid range? Should we change that? >>> >>> But seems this code was initially introduced as part of Johannes' >>> migratetype hygiene series. >> >> It starts as split_free_page() from commit b2c9e2fbba32 ("mm: make >> alloc_contig_range work at pageblock granularity”), but harmless since >> it is only used to split a buddy page. Then commit fd919a85cd55 ("mm: >> page_isolation: prepare for hygienic freelists") refactored it, which >> should be fine, since it is still used for the same purpose in page >> isolation. Then commit e98337d11bbd ("mm/contig_alloc: support __GFP_COMP") >> used it for gigantic hugetlb. >> >> For SPARSEMEM && !SPARSEMEM_VMEMMAP, PFNs are contiguous, vmemmap might not >> be. The code above using pfn in the loop might be fine. And since order >> is provided, unless the caller is providing a falsely large order, pfn >> should be valid. Or am I missing anything? > > I think the question is, what happens when we call pfn_to_page() on a PFN that falls into a memory section that is either offline, doesn't have a memmap, or does not exist. > > With CONFIG_SPARSEMEM, we do a > > struct mem_section *__sec = __pfn_to_section(__pfn) > __section_mem_map_addr(__sec) + __pfn; > > __pfn_to_section() can return NULL, in which case __section_mem_map_addr() would dereference NULL. > > I assume it ould happen in corner cases, if we'd exceed NR_SECTION_ROOTS. (IOW, large memory, and we free a page that is at the very end of physical memory). > > Likely, we should do the pfn_to_page() before the __free_one_page() call. Got it. Both you and Vlastimil gave the same corner case issue. I agree that doing pfn_to_page() before the __free_one_page() could get rid of the concern. Thank you both. Best Regards, Yan, Zi