On 2/17/21 6:33 PM, Vlastimil Babka wrote: > Compaction always operates on pages from a single given zone when isolating > both pages to migrate and freepages. Pageblock boundaries are intersected with > zone boundaries to be safe in case zone starts or ends in the middle of > pageblock. The use of pageblock_pfn_to_page() protects against non-contiguous > pageblocks. > > The functions fast_isolate_freepages() and fast_isolate_around() don't > currently protect the fast freepage isolation thoroughly enough against these > corner cases, and can result in freepage isolation operate outside of zone > boundaries: > > - in fast_isolate_freepages() if we get a pfn from the first pageblock of a > zone that starts in the middle of that pageblock, 'highest' can be a pfn > outside of the zone. If we fail to isolate anything in this function, we > may then call fast_isolate_around() on a pfn outside of the zone and there > effectively do a set_pageblock_skip(page_to_pfn(highest)) which may currently > hit a VM_BUG_ON() in some configurations > - fast_isolate_around() checks only the zone end boundary and not beginning, > nor that the pageblock is contiguous (with pageblock_pfn_to_page()) so it's > possible that we end up calling isolate_freepages_block() on a range of pfn's > from two different zones and end up e.g. isolating freepages under the wrong > zone's lock. > > This patch should fix the above issues. Sorry, totally forgot these: Reported-by: Qian Cai <cai@xxxxxx> Reported-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target") > Cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> Also thanks David and Mel for the acks! Thanks to Mike I was able to boot v5.11 in qemu with memmap containing a type 20 hole as Andrea reported, but can't reproduce the bug so far (i.e. without this patch, with DEBUG_VM enabled) using transhuge-stress; might need some more nuanced workload...