On Thu, May 19, 2022 at 05:35:15PM -0400, Zi Yan wrote: > Do you have a complete reproducer? From your printout, it is clear that a 512-page compound > page caused the infinite loop, because the page was not migrated and the code kept > retrying. But __alloc_contig_migrate_range() is supposed to return non-zero to tell the > code the page cannot be migrated and the code will goto failed without retrying. It will be > great you can share what exactly has run after boot, so that I can reproduce locally to > identify what makes __alloc_contig_migrate_range() return 0 without migrating the page. The reproducer is just to run the same script I shared with you previously multiple times instead. It is still quite reproducible here as it usually happens within a hour. $ for i in `seq 1 100`; do ./flip_mem.py; done > Can you also try the patch below to see if it fixes the infinite loop? > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index b3f074d1682e..abde1877bbcb 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -417,10 +417,9 @@ static int isolate_single_pageblock(unsigned long boundary_pfn, gfp_t gfp_flags, > order = 0; > outer_pfn = pfn; > while (!PageBuddy(pfn_to_page(outer_pfn))) { > - if (++order >= MAX_ORDER) { > - outer_pfn = pfn; > - break; > - } > + /* abort if the free page cannot be found */ > + if (++order >= MAX_ORDER) > + goto failed; > outer_pfn &= ~0UL << order; > } > pfn = outer_pfn; > Can you explain a bit how this patch is the right thing to do here? I am a little bit worry about shooting into the dark. Otherwise, I'll be running the off-by-one part over the weekend to see if that helps.