On Sat, Mar 23, 2019 at 09:40:04AM +0500, Mikhail Gavrilov wrote: > > /* > > * Only clear the hint if a sample indicates there is either a > > * free page or an LRU page in the block. One or other condition > > * is necessary for the block to be a migration source/target. > > */ > > - block_pfn = pageblock_start_pfn(pfn); > > - pfn = max(block_pfn, zone->zone_start_pfn); > > - page = pfn_to_page(pfn); > > - if (zone != page_zone(page)) > > - return false; > > - pfn = block_pfn + pageblock_nr_pages; > > - pfn = min(pfn, zone_end_pfn(zone)); > > - end_page = pfn_to_page(pfn); > > - > > do { > > if (pfn_valid_within(pfn)) { > > if (check_source && PageLRU(page)) { > > Unfortunately this patch didn't helps too. > > kernel log: https://pastebin.com/RHhmXPM2 > Ok, it's somewhat of a pity that we don't know what PFN that page corresponds to. Specifically it would be interesting to know if the PFN corresponds to a memory hole as DMA32 on your machine has a number of gaps. What I'm wondering is if the reinit fails to find good starting points that it picks a PFN that corresponds to an uninitialised page and trips up later. Can you try again with this patch please? It replaces the failed patch entirely. Thanks. diff --git a/mm/compaction.c b/mm/compaction.c index f171a83707ce..caac4b07eb33 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -242,6 +242,7 @@ __reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source, bool check_target) { struct page *page = pfn_to_online_page(pfn); + struct page *block_page; struct page *end_page; unsigned long block_pfn; @@ -267,20 +268,26 @@ __reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source, get_pageblock_migratetype(page) != MIGRATE_MOVABLE) return false; + /* Ensure the start of the pageblock or zone is online and valid */ + block_pfn = pageblock_start_pfn(pfn); + block_page = pfn_to_online_page(max(block_pfn, zone->zone_start_pfn)); + if (block_page) { + page = block_page; + pfn = block_pfn; + } + + /* Ensure the end of the pageblock or zone is online and valid */ + block_pfn += pageblock_nr_pages; + block_pfn = min(block_pfn, zone_end_pfn(zone)); + end_page = pfn_to_online_page(block_pfn); + if (!end_page) + return false; + /* * Only clear the hint if a sample indicates there is either a * free page or an LRU page in the block. One or other condition * is necessary for the block to be a migration source/target. */ - block_pfn = pageblock_start_pfn(pfn); - pfn = max(block_pfn, zone->zone_start_pfn); - page = pfn_to_page(pfn); - if (zone != page_zone(page)) - return false; - pfn = block_pfn + pageblock_nr_pages; - pfn = min(pfn, zone_end_pfn(zone)); - end_page = pfn_to_page(pfn); - do { if (pfn_valid_within(pfn)) { if (check_source && PageLRU(page)) { @@ -320,6 +327,16 @@ static void __reset_isolation_suitable(struct zone *zone) zone->compact_blockskip_flush = false; + + /* + * Re-init the scanners and attempt to find a better starting + * position below. This may result in redundant scanning if + * a better position is not found but it avoids the corner + * case whereby the cached PFNs are left in a memory hole with + * no proper struct page backing it. + */ + reset_cached_positions(zone); + /* * Walk the zone and update pageblock skip information. Source looks * for PageLRU while target looks for PageBuddy. When the scanner @@ -349,13 +366,6 @@ static void __reset_isolation_suitable(struct zone *zone) zone->compact_cached_free_pfn = reset_free; } } - - /* Leave no distance if no suitable block was reset */ - if (reset_migrate >= reset_free) { - zone->compact_cached_migrate_pfn[0] = migrate_pfn; - zone->compact_cached_migrate_pfn[1] = migrate_pfn; - zone->compact_cached_free_pfn = free_pfn; - } } void reset_isolation_suitable(pg_data_t *pgdat) -- Mel Gorman SUSE Labs