+ mm-page_ext-move-page_ext_init-after-page_alloc_init_late.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The patch titled
     Subject: mm, page_ext: move page_ext_init() after page_alloc_init_late()
has been added to the -mm tree.  Its filename is
     mm-page_ext-move-page_ext_init-after-page_alloc_init_late.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-page_ext-move-page_ext_init-after-page_alloc_init_late.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_ext-move-page_ext_init-after-page_alloc_init_late.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Vlastimil Babka <vbabka@xxxxxxx>
Subject: mm, page_ext: move page_ext_init() after page_alloc_init_late()

Commit b8f1a75d61d8 ("mm: call page_ext_init() after all struct pages are
initialized") has avoided a a NULL pointer dereference due to
DEFERRED_STRUCT_PAGE_INIT clashing with page_ext, by calling
page_ext_init() only after the deferred struct page init has finished. 
Later commit fe53ca54270a ("mm: use early_pfn_to_nid in page_ext_init")
avoided the underlying issue differently and moved the page_ext_init()
call back to where it was before.

However, there are two problems with the current code:

- on very large machines, page_ext_init() may fail to allocate the
  page_ext structures, because deferred struct page init hasn't yet
  started, and the pre-inited part might be too small.  This has been
  observed with a 3TB machine with page_owner=on.  Although it was an
  older kernel where page_owner hasn't yet been converted to stack depot,
  thus page_ext was larger, the fundamental problem is still in mainline.

- page_owner's init_pages_in_zone() is called before deferred struct
  page init has started, so it will encounter unitialized struct pages. 
  This currently happens to cause no harm, because the memmap array is are
  pre-zeroed on allocation and thus the "if (page_zone(page) != zone)"
  check is negative, but that pre-zeroing guarantee might change soon.

The second problem could be also solved by limiting init_page_in_zone() by
pgdat->first_deferred_pfn, but fixing the first issue would be more
problematic.  So this patch again moves page_ext_init() to wait for
deferred struct page init to finish.  This has some performance
implications for boot time, which should be acceptable when enabling
debugging functionality.  We however keep the benefits of parallel
initialization (one kthread per node) so it's better than e.g.  disabling
DEFERRED_STRUCT_PAGE_INIT completely when page_ext is being used.

This effectively reverts fe53ca54270a757f ("mm: use early_pfn_to_nid in
page_ext_init").

Link: http://lkml.kernel.org/r/20170720134029.25268-5-vbabka@xxxxxxx
Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
Cc: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Cc: Yang Shi <yang.shi@xxxxxxxxxx>
Cc: Laura Abbott <labbott@xxxxxxxxxx>
Cc: Vinayak Menon <vinmenon@xxxxxxxxxxxxxx>
Cc: zhong jiang <zhongjiang@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 init/main.c   |    3 ++-
 mm/page_ext.c |    4 +---
 2 files changed, 3 insertions(+), 4 deletions(-)

diff -puN init/main.c~mm-page_ext-move-page_ext_init-after-page_alloc_init_late init/main.c
--- a/init/main.c~mm-page_ext-move-page_ext_init-after-page_alloc_init_late
+++ a/init/main.c
@@ -650,7 +650,6 @@ asmlinkage __visible void __init start_k
 		initrd_start = 0;
 	}
 #endif
-	page_ext_init();
 	debug_objects_mem_init();
 	kmemleak_init();
 	setup_per_cpu_pageset();
@@ -1053,6 +1052,8 @@ static noinline void __init kernel_init_
 	sched_init_smp();
 
 	page_alloc_init_late();
+	/* Initialize page ext after all struct pages are initializaed */
+	page_ext_init();
 
 	do_basic_setup();
 
diff -puN mm/page_ext.c~mm-page_ext-move-page_ext_init-after-page_alloc_init_late mm/page_ext.c
--- a/mm/page_ext.c~mm-page_ext-move-page_ext_init-after-page_alloc_init_late
+++ a/mm/page_ext.c
@@ -399,10 +399,8 @@ void __init page_ext_init(void)
 			 * We know some arch can have a nodes layout such as
 			 * -------------pfn-------------->
 			 * N0 | N1 | N2 | N0 | N1 | N2|....
-			 *
-			 * Take into account DEFERRED_STRUCT_PAGE_INIT.
 			 */
-			if (early_pfn_to_nid(pfn) != nid)
+			if (pfn_to_nid(pfn) != nid)
 				continue;
 			if (init_section_page_ext(pfn, nid))
 				goto oom;
_

Patches currently in -mm which might be from vbabka@xxxxxxx are

mm-page_owner-make-init_pages_in_zone-faster.patch
mm-page_ext-periodically-reschedule-during-page_ext_init.patch
mm-page_owner-dont-grab-zone-lock-for-init_pages_in_zone.patch
mm-page_ext-move-page_ext_init-after-page_alloc_init_late.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Kernel Archive]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]

  Powered by Linux