Re: kernel BUG at include/linux/mm.h:1020!

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Mar 23, 2019 at 09:40:04AM +0500, Mikhail Gavrilov wrote:
> >         /*
> >          * Only clear the hint if a sample indicates there is either a
> >          * free page or an LRU page in the block. One or other condition
> >          * is necessary for the block to be a migration source/target.
> >          */
> > -       block_pfn = pageblock_start_pfn(pfn);
> > -       pfn = max(block_pfn, zone->zone_start_pfn);
> > -       page = pfn_to_page(pfn);
> > -       if (zone != page_zone(page))
> > -               return false;
> > -       pfn = block_pfn + pageblock_nr_pages;
> > -       pfn = min(pfn, zone_end_pfn(zone));
> > -       end_page = pfn_to_page(pfn);
> > -
> >         do {
> >                 if (pfn_valid_within(pfn)) {
> >                         if (check_source && PageLRU(page)) {
> 
> Unfortunately this patch didn't helps too.
> 
> kernel log: https://pastebin.com/RHhmXPM2
> 

Ok, it's somewhat of a pity that we don't know what PFN that page
corresponds to. Specifically it would be interesting to know if the PFN
corresponds to a memory hole as DMA32 on your machine has a number of
gaps. What I'm wondering is if the reinit fails to find good starting
points that it picks a PFN that corresponds to an uninitialised page and
trips up later.

Can you try again with this patch please? It replaces the failed patch
entirely.

Thanks.

diff --git a/mm/compaction.c b/mm/compaction.c
index f171a83707ce..caac4b07eb33 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -242,6 +242,7 @@ __reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source,
 							bool check_target)
 {
 	struct page *page = pfn_to_online_page(pfn);
+	struct page *block_page;
 	struct page *end_page;
 	unsigned long block_pfn;
 
@@ -267,20 +268,26 @@ __reset_isolation_pfn(struct zone *zone, unsigned long pfn, bool check_source,
 	    get_pageblock_migratetype(page) != MIGRATE_MOVABLE)
 		return false;
 
+	/* Ensure the start of the pageblock or zone is online and valid */
+	block_pfn = pageblock_start_pfn(pfn);
+	block_page = pfn_to_online_page(max(block_pfn, zone->zone_start_pfn));
+	if (block_page) {
+		page = block_page;
+		pfn = block_pfn;
+	}
+
+	/* Ensure the end of the pageblock or zone is online and valid */
+	block_pfn += pageblock_nr_pages;
+	block_pfn = min(block_pfn, zone_end_pfn(zone));
+	end_page = pfn_to_online_page(block_pfn);
+	if (!end_page)
+		return false;
+
 	/*
 	 * Only clear the hint if a sample indicates there is either a
 	 * free page or an LRU page in the block. One or other condition
 	 * is necessary for the block to be a migration source/target.
 	 */
-	block_pfn = pageblock_start_pfn(pfn);
-	pfn = max(block_pfn, zone->zone_start_pfn);
-	page = pfn_to_page(pfn);
-	if (zone != page_zone(page))
-		return false;
-	pfn = block_pfn + pageblock_nr_pages;
-	pfn = min(pfn, zone_end_pfn(zone));
-	end_page = pfn_to_page(pfn);
-
 	do {
 		if (pfn_valid_within(pfn)) {
 			if (check_source && PageLRU(page)) {
@@ -320,6 +327,16 @@ static void __reset_isolation_suitable(struct zone *zone)
 
 	zone->compact_blockskip_flush = false;
 
+
+	/*
+	 * Re-init the scanners and attempt to find a better starting
+	 * position below. This may result in redundant scanning if
+	 * a better position is not found but it avoids the corner
+	 * case whereby the cached PFNs are left in a memory hole with
+	 * no proper struct page backing it.
+	 */
+	reset_cached_positions(zone);
+
 	/*
 	 * Walk the zone and update pageblock skip information. Source looks
 	 * for PageLRU while target looks for PageBuddy. When the scanner
@@ -349,13 +366,6 @@ static void __reset_isolation_suitable(struct zone *zone)
 			zone->compact_cached_free_pfn = reset_free;
 		}
 	}
-
-	/* Leave no distance if no suitable block was reset */
-	if (reset_migrate >= reset_free) {
-		zone->compact_cached_migrate_pfn[0] = migrate_pfn;
-		zone->compact_cached_migrate_pfn[1] = migrate_pfn;
-		zone->compact_cached_free_pfn = free_pfn;
-	}
 }
 
 void reset_isolation_suitable(pg_data_t *pgdat)

-- 
Mel Gorman
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux