On 20 Nov 2024, at 11:13, David Hildenbrand wrote: > On 20.11.24 16:55, Zi Yan wrote: >> On 19 Nov 2024, at 11:52, David Hildenbrand wrote: >> >>> On 19.11.24 17:49, David Hildenbrand wrote: >>>> On 19.11.24 17:41, Zi Yan wrote: >>>>> On 19 Nov 2024, at 11:31, David Hildenbrand wrote: >>>>> >>>>>> On 19.11.24 17:12, Zi Yan wrote: >>>>>>> On 19 Nov 2024, at 10:29, David Hildenbrand wrote: >>>>>>> >>>>>>>>> +/* Split a multi-block free page into its individual pageblocks. */ >>>>>>>>> +static void split_large_buddy(struct zone *zone, struct page *page, >>>>>>>>> + unsigned long pfn, int order, fpi_t fpi) >>>>>>>>> +{ >>>>>>>>> + unsigned long end = pfn + (1 << order); >>>>>>>>> + >>>>>>>>> + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order)); >>>>>>>>> + /* Caller removed page from freelist, buddy info cleared! */ >>>>>>>>> + VM_WARN_ON_ONCE(PageBuddy(page)); >>>>>>>>> + >>>>>>>>> + if (order > pageblock_order) >>>>>>>>> + order = pageblock_order; >>>>>>>>> + >>>>>>>>> + while (pfn != end) { >>>>>>>>> + int mt = get_pfnblock_migratetype(page, pfn); >>>>>>>>> + >>>>>>>>> + __free_one_page(page, pfn, zone, order, mt, fpi); >>>>>>>>> + pfn += 1 << order; >>>>>>>>> + page = pfn_to_page(pfn); >>>>>>>>> + } >>>>>>>>> +} >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> stumbling over this while digging through the code .... >>>>>>>> >>>>>>>>> + >>>>>>>>> static void free_one_page(struct zone *zone, struct page *page, >>>>>>>>> unsigned long pfn, unsigned int order, >>>>>>>>> fpi_t fpi_flags) >>>>>>>>> { >>>>>>>>> unsigned long flags; >>>>>>>>> - int migratetype; >>>>>>>>> spin_lock_irqsave(&zone->lock, flags); >>>>>>>>> - migratetype = get_pfnblock_migratetype(page, pfn); >>>>>>>>> - __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); >>>>>>>> >>>>>>>> This change is rather undesired: >>>>>>>> >>>>>>>> via __free_pages_core()->__free_pages_ok() we can easily end up here with order=MAX_PAGE_ORDER. >>>>>>> >>>>>>> Do you have a concrete example? PMD THP on x86_64 is pageblock_order. >>>>>>> We do not have PMD level mTHP yet. Any other possible source? >>>>>>> >>>>>>>> >>>>>>>> What your new code will do is split this perfectly reasonable MAX_PAGE_ORDER chunk via split_large_buddy() into pageblock-sized chunks, and let the buddy merging logic undo our unnecessary splitting. >>>>>>>> >>>>>>>> Is there a way to avoid this and just process the whole MAX_PAGE_ORDER chunk like we used to? >>>>>>> >>>>>>> Probably split_large_buddy() can check the migratetypes of the to-be-freed >>>>>>> page, if order > pageblock_order. If all migratetypes are the same, the page >>>>>>> can be freed at MAX_PAGE_ORDER, otherwise pageblock_order. >>>>>> >>>>>> Thinking about this: why do we care about the migratetype? >>>>>> >>>>>> We only have to fallback to pageblocks if any pageblock is "MIGRATE_ISOLATE" (and maybe MIGRATE_CMA), but not all. Otherwise, we can just ignore the migratetype (or rather overwrite it) >>>>> >>>>> There are VM_WARN_ONCEs around *_free_list() operations to make sure >>>>> page migratetype matches the migratetype of the list it is on. Ignoring >>>>> migratetype would trigger these WARNs. Overwriting it would work but >>>>> free page migratetype accounting needs to be taken care of. >>>>> >>>>> An implicit reason is that __free_one_page() does not support >MAX_PAGE_ORDER >>>>> and gigantic hugetlb folios are freed via free_one_page(), where >>>>> split_large_buddy() is used to work around the limitation. >>>> >>>> Yes, I saw that change. But that can be easily identified cased by >>>> unlikely(order > MAX_PAGE_ORDER) to handle that rare case in a special way. >>>> >>>> > > For the two memory init cases you mentioned in the other email, >>>> maybe a new >>>>> fpi flag to make free_one_page() use __free_one_page() for them, >>>>> since migratetypes should be the same across MAX_PAGE_ORDER range there, right? >>>> >>>> In the context of alloc_frozen_range()/free_frozen_range() I want to >>>> free MAX_PAGE_ORDER chunks when possible, and not have some odd logic in >>>> the freeing path undo some of that effort. >>> >>> Adding a pointer to that discussion: >>> >>> https://lkml.kernel.org/r/ZzZdnuZBf-xgiwD2@xxxxxxxxxxxxxxxxxxxx >> >> Thanks. >> >> So you are thinking about something like this: >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index b6958333054d..3d3341dc1ad1 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -1254,7 +1254,12 @@ static void free_one_page(struct zone *zone, struct page *page, >> unsigned long flags; >> >> spin_lock_irqsave(&zone->lock, flags); >> - split_large_buddy(zone, page, pfn, order, fpi_flags); >> + if (unlikely(order > MAX_PAGE_ORDER)) >> + split_large_buddy(zone, page, pfn, order, fpi_flags);> + else { >> + int migratetype = get_pfnblock_migratetype(page, pfn); >> + __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); >> + } >> spin_unlock_irqrestore(&zone->lock, flags); >> >> __count_vm_events(PGFREE, 1 << order); >> > > Something in that direction, but likely something like: > > if (likely(order <= pageblock_order)) { > int migratetype = get_pfnblock_migratetype(page, pfn); > __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); > } else if (order <= MAX_PAGE_ORDER) { > /* We might have to split when some pageblocks differ. */ > maybe_split_large_buddy(zone, page, pfn, order, fpi_flags); > } else { > /* The buddy works in max MAX_PAGE_ORDER chunks. */ > for /* each MAX_PAGE_ORDER chunk */ > maybe_split_large_buddy(zone, page, pfn, MAX_PAGE_ORDER, fpi_flags); > pfn += MAX_ORDER_NR_PAGES; > page = ... > } > } > > Whereby maybe_split_large_buddy would check if it has to do anything with the > pageblocks, or whether it can just call __free_one_page(order). It would only > accept order <= > > In the future with free_frozen_pages(), the last else case would vanish. > >> >> Is it possible to have a MAX_PAGE_ORDER hugetlb folio? If no, we are good. > > Hm, I wouldn't rule it out. To be future proof, we'd likely should add a > way to handle it. Shouldn't be too hard and expensive: simply read all > pageblock migratetypes. > > I wonder if there are configs (no page isolation?) where we don't have to > check anything. CONFIG_CONTIG_ALLOC? CONFIG_MEMORY_ISOLATION seems too much. > >> If yes, alloc_contig_range() can change the migratetype of one of half that folio >> and during migration phase, that folio will be freed via __free_one_page() >> and causing migratetype mismatch. > > Can you remind me how that happens? There is a MAX_PAGE_ORDER hugetlb folio across two pageblocks and alloc_contig_range() wants to get a range that overlaps with first pageblock of that hugetlb folio. All pageblocks are marked as isolated first during start_isolate_page_range(), then __alloc_contig_migrate_range() isolates and migrates the hugetlb folio. During migration, the original hugetlb folio is freed and since it is MAX_PAGE_ORDER and based on my code above, it is freed via __free_one_page() with two different migratetypes. But with your maybe_split_large_buddy(), it would work. > > Likely, we isolate a single pageblock, and shortly after we free a larger > page that covers multiple pageblocks? So this could affect other > alloc_contig_range() users as well, I assume? Best Regards, Yan, Zi