[patch 128/200] mm, page_alloc: move draining pcplists to page isolation users

Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> · Mon, 14 Dec 2020 19:10:56 -0800

From: Vlastimil Babka <vbabka@xxxxxxx>
Subject: mm, page_alloc: move draining pcplists to page isolation users

Currently, pcplists are drained during set_migratetype_isolate() which
means once per pageblock processed start_isolate_page_range().  This is
somewhat wasteful.  Moreover, the callers might need different guarantees,
and the draining is currently prone to races and does not guarantee that
no page from isolated pageblock will end up on the pcplist after the
drain.

Better guarantees are added by later patches and require explicit actions
by page isolation users that need them.  Thus it makes sense to move the
current imperfect draining to the callers also as a preparation step.

Link: https://lkml.kernel.org/r/20201111092812.11329-7-vbabka@xxxxxxx
Suggested-by: David Hildenbrand <david@xxxxxxxxxx>
Suggested-by: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
Reviewed-by: David Hildenbrand <david@xxxxxxxxxx>
Reviewed-by: Oscar Salvador <osalvador@xxxxxxx>
Acked-by: Michal Hocko <mhocko@xxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/memory_hotplug.c |   11 ++++++-----
 mm/page_alloc.c     |    2 ++
 mm/page_isolation.c |   10 +++++-----
 3 files changed, 13 insertions(+), 10 deletions(-)

--- a/mm/memory_hotplug.c~mm-page_alloc-move-draining-pcplists-to-page-isolation-users
+++ a/mm/memory_hotplug.c
@@ -1500,6 +1500,8 @@ int __ref offline_pages(unsigned long st
 		goto failed_removal;
 	}
 
+	drain_all_pages(zone);
+
 	arg.start_pfn = start_pfn;
 	arg.nr_pages = nr_pages;
 	node_states_check_changes_offline(nr_pages, zone, &arg);
@@ -1550,11 +1552,10 @@ int __ref offline_pages(unsigned long st
 		}
 
 		/*
-		 * per-cpu pages are drained in start_isolate_page_range, but if
-		 * there are still pages that are not free, make sure that we
-		 * drain again, because when we isolated range we might
-		 * have raced with another thread that was adding pages to pcp
-		 * list.
+		 * per-cpu pages are drained after start_isolate_page_range, but
+		 * if there are still pages that are not free, make sure that we
+		 * drain again, because when we isolated range we might have
+		 * raced with another thread that was adding pages to pcp list.
 		 *
 		 * Forward progress should be still guaranteed because
 		 * pages on the pcp list can only belong to MOVABLE_ZONE
--- a/mm/page_alloc.c~mm-page_alloc-move-draining-pcplists-to-page-isolation-users
+++ a/mm/page_alloc.c
@@ -8528,6 +8528,8 @@ int alloc_contig_range(unsigned long sta
 	if (ret)
 		return ret;
 
+	drain_all_pages(cc.zone);
+
 	/*
 	 * In case of -EBUSY, we'd like to know which page causes problem.
 	 * So, just fall through. test_pages_isolated() has a tracepoint
--- a/mm/page_isolation.c~mm-page_alloc-move-draining-pcplists-to-page-isolation-users
+++ a/mm/page_isolation.c
@@ -49,7 +49,6 @@ static int set_migratetype_isolate(struc
 
 		__mod_zone_freepage_state(zone, -nr_pages, mt);
 		spin_unlock_irqrestore(&zone->lock, flags);
-		drain_all_pages(zone);
 		return 0;
 	}
 
@@ -172,11 +171,12 @@ __first_valid_page(unsigned long pfn, un
  *
  * Please note that there is no strong synchronization with the page allocator
  * either. Pages might be freed while their page blocks are marked ISOLATED.
- * In some cases pages might still end up on pcp lists and that would allow
+ * A call to drain_all_pages() after isolation can flush most of them. However
+ * in some cases pages might still end up on pcp lists and that would allow
  * for their allocation even when they are in fact isolated already. Depending
- * on how strong of a guarantee the caller needs drain_all_pages might be needed
- * (e.g. __offline_pages will need to call it after check for isolated range for
- * a next retry).
+ * on how strong of a guarantee the caller needs, further drain_all_pages()
+ * might be needed (e.g. __offline_pages will need to call it after check for
+ * isolated range for a next retry).
  *
  * Return: 0 on success and -EBUSY if any part of range cannot be isolated.
  */
_