Re: [PATCH v2 2/2] mm: fix initialization of struct page for holes in memory layout

Mike Rapoport <rppt@xxxxxxxxxx> · Wed, 6 Jan 2021 10:05:53 +0200

On Tue, Jan 05, 2021 at 01:45:37PM -0500, Qian Cai wrote:
> On Tue, 2021-01-05 at 10:24 +0200, Mike Rapoport wrote:
> > Hi,
> > 
> > On Mon, Jan 04, 2021 at 02:03:00PM -0500, Qian Cai wrote:
> > > On Wed, 2020-12-09 at 23:43 +0200, Mike Rapoport wrote:
> > > > From: Mike Rapoport <rppt@xxxxxxxxxxxxx>
> > > > 
> > > > Interleave initialization of pages that correspond to holes with the
> > > > initialization of memory map, so that zone and node information will be
> > > > properly set on such pages.
> > > > 
> > > > Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
> > > > rather
> > > > that check each PFN")
> > > > Reported-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> > > > Signed-off-by: Mike Rapoport <rppt@xxxxxxxxxxxxx>
> > > 
> > > Reverting this commit on the top of today's linux-next fixed a crash while
> > > reading /proc/kpagecount on a NUMA server.
> > 
> > Can you please post the entire dmesg?
> 
> http://people.redhat.com/qcai/dmesg.txt
> 
> > Is it possible to get the pfn that triggered the crash?
> 
> Do you have any idea how to convert that fffffffffffffffe to pfn as it is always
> that address? I don't understand what that address is though. I tried to catch
> it from struct page pointer and page_address() without luck.

I think we trigger PF_POISONED_CHECK() in PageSlab(), then fffffffffffffffe
is "accessed" from VM_BUG_ON_PAGE().

It seems to me that we are not initializing struct pages for holes at the node
boundaries because zones are already clamped to exclude those holes.

Can you please try to see if the patch below will produce any useful info:

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 4dcbcd506cb6..708f8211dcc0 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -66,10 +66,14 @@ static ssize_t kpagecount_read(struct file *file, char __user *buf,
 		 */
 		ppage = pfn_to_online_page(pfn);
 
-		if (!ppage || PageSlab(ppage) || page_has_type(ppage))
+		if (ppage && PagePoisoned(ppage)) {
+			pr_info("%s: pfn %lx is poisoned\n", __func__, pfn);
 			pcount = 0;
-		else
+		} else if (!ppage || PageSlab(ppage) || page_has_type(ppage)) {
+			pcount = 0;
+		} else {
 			pcount = page_mapcount(ppage);
+		}
 
 		if (put_user(pcount, out)) {
 			ret = -EFAULT;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 124b8c654ec6..1b3a37ace1b1 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6271,6 +6271,8 @@ static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn,
 	unsigned long pfn;
 	u64 pgcnt = 0;
 
+	pr_info("%s: spfn: %lx, epfn: %lx, zone: %s, node: %d\n", __func__, spfn, epfn, zone_names[zone], node);
+
 	for (pfn = spfn; pfn < epfn; pfn++) {
 		if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
 			pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
 
> >  
> > > [ 8858.006726][T99897] BUG: unable to handle page fault for address:
> > > fffffffffffffffe
> > > [ 8858.014814][T99897] #PF: supervisor read access in kernel mode
> > > [ 8858.020686][T99897] #PF: error_code(0x0000) - not-present page
> > > [ 8858.026557][T99897] PGD 1371417067 P4D 1371417067 PUD 1371419067 PMD 0 
> > > [ 8858.033224][T99897] Oops: 0000 [#1] SMP KASAN NOPTI
> > > [ 8858.038710][T99897] CPU: 28 PID: 99897 Comm: proc01 Tainted:
> > > G           O      5.11.0-rc1-next-20210104 #1
> > > [ 8858.048515][T99897] Hardware name: HPE ProLiant DL385 Gen10/ProLiant
> > > DL385 Gen10, BIOS A40 03/09/2018
> > > [ 8858.057794][T99897] RIP: 0010:kpagecount_read+0x1be/0x5e0
> > > PageSlab at include/linux/page-flags.h:342
> > > (inlined by) kpagecount_read at fs/proc/page.c:69
> 

-- 
Sincerely yours,
Mike.