On 16 Oct 2023, at 10:37, Johannes Weiner wrote: > On Mon, Oct 16, 2023 at 09:35:34AM -0400, Zi Yan wrote: >>> The attached patch has all the suggested changes, let me know how it >>> looks to you. Thanks. >> >> The one I sent has free page accounting issues. The attached one fixes them. > > Do you still have the warnings? I wonder what went wrong. No warnings. But something with the code: 1. in your version, split_free_page() is called without changing any pageblock migratetypes, then split_free_page() is just a no-op, since the page is just deleted from the free list, then freed via different orders. Buddy allocator will merge them back. 2. in my version, I set pageblock migratetype to new_mt before split_free_page(), but it causes free page accounting issues, since in the case of head, free pages are deleted from new_mt when they are in old_mt free list and the accounting decreases new_mt free page number instead of old_mt one. Basically, split_free_page() is awkward as it relies on preset migratetypes, which changes migratetypes without deleting the free pages from the list first. That is why I came up with the new split_free_page() below. > >> @@ -883,6 +886,10 @@ int split_free_page(struct page *free_page, >> mt = get_pfnblock_migratetype(free_page, free_page_pfn); >> del_page_from_free_list(free_page, zone, order, mt); >> >> + set_pageblock_migratetype(free_page, mt1); >> + set_pageblock_migratetype(pfn_to_page(free_page_pfn + split_pfn_offset), >> + mt2); >> + >> for (pfn = free_page_pfn; >> pfn < free_page_pfn + (1UL << order);) { >> int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); > > I don't think this is quite right. > > With CONFIG_ARCH_FORCE_MAX_ORDER it's possible that we're dealing with > a buddy that is more than two blocks: > > [pageblock 0][pageblock 1][pageblock 2][pageblock 3] > [buddy ] > [isolate range .. > > That for loop splits the buddy into 4 blocks. The code above would set > pageblock 0 to old_mt, and pageblock 1 to new_mt. But it should only > set pageblock 3 to new_mt. OK. I think I need to fix split_free_page(). Hmm, if CONFIG_ARCH_FORCE_MAX_ORDER can make a buddy have more than one pageblock and in turn makes an in-use page have more than one pageblock, we will have problems. Since in isolate_single_pageblock(), an in-use page can have part of its pageblock set to a different migratetype and be freed, causing the free page with unmatched migratetypes. We might need to free pages at pageblock_order if their orders are bigger than pageblock_order. Which arch with CONFIG_ARCH_FORCE_MAX_ORDER can have a buddy containing more than one pageblocks? I would like to make some tests. > > My proposal had the mt update in the caller: > >> @@ -139,6 +139,62 @@ static struct page *has_unmovable_pages(unsigned long start_pfn, unsigned long e >> return NULL; >> } >> >> +/* >> + * additional steps for moving free pages during page isolation >> + */ >> +static int move_freepages_for_isolation(struct zone *zone, unsigned long start_pfn, >> + unsigned long end_pfn, int old_mt, int new_mt) >> +{ >> + struct page *start_page = pfn_to_page(start_pfn); >> + unsigned long pfn; >> + >> + VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1)); >> + VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn); >> + >> + /* >> + * A free page may be comprised of 2^n blocks, which means our >> + * block of interest could be head or tail in such a page. >> + * >> + * If we're a tail, update the type of our block, then split >> + * the page into pageblocks. The splitting will do the leg >> + * work of sorting the blocks into the right freelists. >> + * >> + * If we're a head, split the page into pageblocks first. This >> + * ensures the migratetypes still match up during the freelist >> + * removal. Then do the regular scan for buddies in the block >> + * of interest, which will handle the rest. >> + * >> + * In theory, we could try to preserve 2^1 and larger blocks >> + * that lie outside our range. In practice, MAX_ORDER is >> + * usually one or two pageblocks anyway, so don't bother. >> + * >> + * Note that this only applies to page isolation, which calls >> + * this on random blocks in the pfn range! When we move stuff >> + * from inside the page allocator, the pages are coming off >> + * the freelist (can't be tail) and multi-block pages are >> + * handled directly in the stealing code (can't be a head). >> + */ >> + >> + /* We're a tail */ >> + pfn = find_straddling_buddy(start_pfn); >> + if (pfn != start_pfn) { >> + struct page *free_page = pfn_to_page(pfn); >> + >> + split_free_page(free_page, buddy_order(free_page), >> + pageblock_nr_pages, old_mt, new_mt); >> + return pageblock_nr_pages; >> + } >> + >> + /* We're a head */ >> + if (PageBuddy(start_page) && buddy_order(start_page) > pageblock_order) { >> + split_free_page(start_page, buddy_order(start_page), >> + pageblock_nr_pages, new_mt, old_mt); >> + return pageblock_nr_pages; >> + } > > i.e. here ^: set the mt of the block that's in isolation range, then > split the block. > > I think I can guess the warning you were getting: in the head case, we > need to change the type of the head pageblock that's on the > freelist. If we do it before calling split, the > del_page_from_freelist() in there warns about the wrong type. > > How about pulling the freelist removal out of split_free_page()? > > del_page_from_freelist(huge_buddy); > set_pageblock_migratetype(start_page, MIGRATE_ISOLATE); > split_free_page(huge_buddy, buddy_order(), pageblock_nr_pages); > return pageblock_nr_pages; Yes, this is better. Let me change to this implementation. But I would like to test it on an environment where a buddy contains more than one pageblocks first. I probably can change MAX_ORDER of x86_64 to do it locally. I will report back. -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature