On 3/30/22 23:43, Zi Yan wrote: > On 30 Mar 2022, at 17:25, Zi Yan wrote: > >> On 30 Mar 2022, at 16:53, Steven Rostedt wrote: >> >>> On Wed, 30 Mar 2022 16:29:28 -0400 >>> Zi Yan <ziy@xxxxxxxxxx> wrote: >>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index bdc8f60ae462..83a90e2973b7 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -1108,6 +1108,8 @@ static inline void __free_one_page(struct page *page, >>>> >>>> buddy_pfn = __find_buddy_pfn(pfn, order); >>>> buddy = page + (buddy_pfn - pfn); >>>> + if (!page_is_buddy(page, buddy, order)) >>>> + goto done_merging; >>>> buddy_mt = get_pageblock_migratetype(buddy); >>>> >>>> if (migratetype != buddy_mt >>>> >>> >>> The above did not apply to Linus's tree, nor even the problem commit >>> (before or after), but I found where the code is, and added it manually. >>> >>> It does appear to allow the machine to boot. >>> >> I just pulled Linus’s tree and grabbed the diff. Anyway, thanks. >> >> I would like to get more understanding of the issue before blindly sending >> this as a fix. >> >> Merge the other thread: >>> >>> Not sure if this matters or not, but my kernel command line has: >>> >>> crashkernel=256M >>> >>> Could that have caused this to break? >> >> Unlikely, 256MB is MAX_ORDER_NR_PAGES aligned (MAX_ORDER is 11 here). >> __find_buddy_pfn() will not get any buddy_pfn from crashkernel memory >> region, since that would cross MAX_ORDER_NR_PAGES boundary. >> >> page_is_buddy() checks page_is_guard(buddy), PageBuddy(buddy), >> buddy_order(buddy), and page_zone_id(buddy), where page_is_guard(buddy) >> is always false since CONFIG_DEBUG_PAGEALLOC is not set in your config. >> So either PageBuddy(buddy) is false, buddy_order(buddy) != order, >> or page_zone_id(buddy) is not the same as page_zone_id(page). >> >> Do you mind adding the following code right before my fix code above >> and provide a complete boot log? I would like to understand what >> went wrong. Thanks. >> >> pr_info("buddy_pfn: %lx, PageBuddy: %d, buddy_order: %d (vs %d), page_zone_id: %d (vs %d)\n", >> buddy_pfn, PageBuddy(buddy), buddy_order(buddy), order, page_zone_id(buddy), >> page_zone_id(page)); >> >> > > This seems to be a bug in the original code too. > But "if (unlikely(has_isolate_pageblock(zone)))" is too rare to trigger it. > I do not see how having isolated pageblocks in a zone could get us away > from checking page_is_buddy(). IIRC the assumption was that pageblock bitmaps would always exist withing MAX_ORDER blocks. But here we are still under mem_init() where has_isolate_pageblock() couldn't happen. And the assumption could have been silently broken by subsequent memory init changes. > -- > Best Regards, > Yan, Zi