On 19 Nov 2024, at 11:52, David Hildenbrand wrote: > On 19.11.24 17:49, David Hildenbrand wrote: >> On 19.11.24 17:41, Zi Yan wrote: >>> On 19 Nov 2024, at 11:31, David Hildenbrand wrote: >>> >>>> On 19.11.24 17:12, Zi Yan wrote: >>>>> On 19 Nov 2024, at 10:29, David Hildenbrand wrote: >>>>> >>>>>>> +/* Split a multi-block free page into its individual pageblocks. */ >>>>>>> +static void split_large_buddy(struct zone *zone, struct page *page, >>>>>>> + unsigned long pfn, int order, fpi_t fpi) >>>>>>> +{ >>>>>>> + unsigned long end = pfn + (1 << order); >>>>>>> + >>>>>>> + VM_WARN_ON_ONCE(!IS_ALIGNED(pfn, 1 << order)); >>>>>>> + /* Caller removed page from freelist, buddy info cleared! */ >>>>>>> + VM_WARN_ON_ONCE(PageBuddy(page)); >>>>>>> + >>>>>>> + if (order > pageblock_order) >>>>>>> + order = pageblock_order; >>>>>>> + >>>>>>> + while (pfn != end) { >>>>>>> + int mt = get_pfnblock_migratetype(page, pfn); >>>>>>> + >>>>>>> + __free_one_page(page, pfn, zone, order, mt, fpi); >>>>>>> + pfn += 1 << order; >>>>>>> + page = pfn_to_page(pfn); >>>>>>> + } >>>>>>> +} >>>>>> >>>>>> Hi, >>>>>> >>>>>> stumbling over this while digging through the code .... >>>>>> >>>>>>> + >>>>>>> static void free_one_page(struct zone *zone, struct page *page, >>>>>>> unsigned long pfn, unsigned int order, >>>>>>> fpi_t fpi_flags) >>>>>>> { >>>>>>> unsigned long flags; >>>>>>> - int migratetype; >>>>>>> spin_lock_irqsave(&zone->lock, flags); >>>>>>> - migratetype = get_pfnblock_migratetype(page, pfn); >>>>>>> - __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); >>>>>> >>>>>> This change is rather undesired: >>>>>> >>>>>> via __free_pages_core()->__free_pages_ok() we can easily end up here with order=MAX_PAGE_ORDER. >>>>> >>>>> Do you have a concrete example? PMD THP on x86_64 is pageblock_order. >>>>> We do not have PMD level mTHP yet. Any other possible source? >>>>> >>>>>> >>>>>> What your new code will do is split this perfectly reasonable MAX_PAGE_ORDER chunk via split_large_buddy() into pageblock-sized chunks, and let the buddy merging logic undo our unnecessary splitting. >>>>>> >>>>>> Is there a way to avoid this and just process the whole MAX_PAGE_ORDER chunk like we used to? >>>>> >>>>> Probably split_large_buddy() can check the migratetypes of the to-be-freed >>>>> page, if order > pageblock_order. If all migratetypes are the same, the page >>>>> can be freed at MAX_PAGE_ORDER, otherwise pageblock_order. >>>> >>>> Thinking about this: why do we care about the migratetype? >>>> >>>> We only have to fallback to pageblocks if any pageblock is "MIGRATE_ISOLATE" (and maybe MIGRATE_CMA), but not all. Otherwise, we can just ignore the migratetype (or rather overwrite it) >>> >>> There are VM_WARN_ONCEs around *_free_list() operations to make sure >>> page migratetype matches the migratetype of the list it is on. Ignoring >>> migratetype would trigger these WARNs. Overwriting it would work but >>> free page migratetype accounting needs to be taken care of. >>> >>> An implicit reason is that __free_one_page() does not support >MAX_PAGE_ORDER >>> and gigantic hugetlb folios are freed via free_one_page(), where >>> split_large_buddy() is used to work around the limitation. >> >> Yes, I saw that change. But that can be easily identified cased by >> unlikely(order > MAX_PAGE_ORDER) to handle that rare case in a special way. >> >> > > For the two memory init cases you mentioned in the other email, >> maybe a new >>> fpi flag to make free_one_page() use __free_one_page() for them, >>> since migratetypes should be the same across MAX_PAGE_ORDER range there, right? >> >> In the context of alloc_frozen_range()/free_frozen_range() I want to >> free MAX_PAGE_ORDER chunks when possible, and not have some odd logic in >> the freeing path undo some of that effort. > > Adding a pointer to that discussion: > > https://lkml.kernel.org/r/ZzZdnuZBf-xgiwD2@xxxxxxxxxxxxxxxxxxxx Thanks. So you are thinking about something like this: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b6958333054d..3d3341dc1ad1 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1254,7 +1254,12 @@ static void free_one_page(struct zone *zone, struct page *page, unsigned long flags; spin_lock_irqsave(&zone->lock, flags); - split_large_buddy(zone, page, pfn, order, fpi_flags); + if (unlikely(order > MAX_PAGE_ORDER)) + split_large_buddy(zone, page, pfn, order, fpi_flags); + else { + int migratetype = get_pfnblock_migratetype(page, pfn); + __free_one_page(page, pfn, zone, order, migratetype, fpi_flags); + } spin_unlock_irqrestore(&zone->lock, flags); __count_vm_events(PGFREE, 1 << order); Is it possible to have a MAX_PAGE_ORDER hugetlb folio? If no, we are good. If yes, alloc_contig_range() can change the migratetype of one of half that folio and during migration phase, that folio will be freed via __free_one_page() and causing migratetype mismatch. Best Regards, Yan, Zi