Re: [PATCHv8 02/14] mm: Add support for unaccepted memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 09, 2022 at 11:23:50PM +0100, Vlastimil Babka wrote:
> On 12/9/22 20:26, Kirill A. Shutemov wrote:
> >> >  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> >> >  			/*
> >> >  			 * Watermark failed for this zone, but see if we can
> >> > @@ -4299,6 +4411,9 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> >> >  
> >> >  			return page;
> >> >  		} else {
> >> > +			if (try_to_accept_memory(zone))
> >> > +				goto try_this_zone;
> >> 
> >> On the other hand, here we failed the full rmqueue(), including the
> >> potentially fragmenting fallbacks, so I'm worried that before we finally
> >> fail all of that and resort to accepting more memory, we already fragmented
> >> the already accepted memory, more than necessary.
> > 
> > I'm not sure I follow. We accept memory in pageblock chunks. Do we want to
> > allocate from a free pageblock if we have other memory to tap from? It
> > doesn't make sense to me.
> 
> The fragmentation avoidance based on migratetype does work with pageblock
> granularity, so yeah, if you accept a single pageblock worth of memory and
> then (through __rmqueue_fallback()) end up serving both movable and
> unmovable allocations from it, the whole fragmentation avoidance mechanism
> is defeated and you end up with unmovable allocations (e.g. page tables)
> scattered over many pageblocks and inability to allocate any huge pages.
> 
> >> So one way to prevent would be to move the acceptance into rmqueue() to
> >> happen before __rmqueue_fallback(), which I originally had in mind and maybe
> >> suggested that previously.
> > 
> > I guess it should be pretty straight forward to fail __rmqueue_fallback()
> > if there's non-empty unaccepted_pages list and steer to
> > try_to_accept_memory() this way.
> 
> That could be a way indeed. We do have ALLOC_NOFRAGMENT which could be
> possible to employ here.
> But maybe the zone_watermark_fast() modification would be simpler yet
> sufficient. It makes sense to me that we'd try to keep a high watermark
> worth of pre-accepted memory. zone_watermark_fast() would fail at low
> watermark, so we could try accepting (high-low) at a time instead of single
> pageblock.

Looks like we already have __zone_watermark_unusable_free() that seems
match use-case rather closely. We only need switch unaccepted memory to
per-zone accounting.

The fixup below suppose to do the trick, but I'm not sure how to test
fragmentation avoidance properly.

Any suggestions?

diff --git a/drivers/base/node.c b/drivers/base/node.c
index ca6f0590be21..1bd2d245edee 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -483,7 +483,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
 #ifdef CONFIG_UNACCEPTED_MEMORY
 			     ,
-			     nid, K(node_page_state(pgdat, NR_UNACCEPTED))
+			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
 #endif
 			    );
 	len += hugetlb_report_node_meminfo(buf, len, nid);
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 789b77c7b6df..e9c05b4c457c 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -157,7 +157,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 
 #ifdef CONFIG_UNACCEPTED_MEMORY
 	show_val_kb(m, "Unaccepted:     ",
-		    global_node_page_state(NR_UNACCEPTED));
+		    global_zone_page_state(NR_UNACCEPTED));
 #endif
 
 	hugetlb_report_meminfo(m);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9c762e8175fc..8b5800cd4424 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -152,6 +152,9 @@ enum zone_stat_item {
 	NR_ZSPAGES,		/* allocated in zsmalloc */
 #endif
 	NR_FREE_CMA_PAGES,
+#ifdef CONFIG_UNACCEPTED_MEMORY
+	NR_UNACCEPTED,
+#endif
 	NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
@@ -198,9 +201,6 @@ enum node_stat_item {
 	NR_FOLL_PIN_ACQUIRED,	/* via: pin_user_page(), gup flag: FOLL_PIN */
 	NR_FOLL_PIN_RELEASED,	/* pages returned via unpin_user_page() */
 	NR_KERNEL_STACK_KB,	/* measured in KiB */
-#ifdef CONFIG_UNACCEPTED_MEMORY
-	NR_UNACCEPTED,
-#endif
 #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
 	NR_KERNEL_SCS_KB,	/* measured in KiB */
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e80e8d398863..404b267332a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1779,7 +1779,7 @@ static bool try_to_accept_memory(struct zone *zone)
 
 	migratetype = get_pfnblock_migratetype(page, page_to_pfn(page));
 	__mod_zone_freepage_state(zone, -1 << order, migratetype);
-	__mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, -1 << order);
+	__mod_zone_page_state(zone, NR_UNACCEPTED, -1 << order);
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	if (last)
@@ -1808,7 +1808,7 @@ static void __free_unaccepted(struct page *page, unsigned int order)
 	migratetype = get_pfnblock_migratetype(page, page_to_pfn(page));
 	list_add_tail(&page->lru, &zone->unaccepted_pages);
 	__mod_zone_freepage_state(zone, 1 << order, migratetype);
-	__mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, 1 << order);
+	__mod_zone_page_state(zone, NR_UNACCEPTED, 1 << order);
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	if (first)
@@ -4074,6 +4074,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
 	if (!(alloc_flags & ALLOC_CMA))
 		unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
 #endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+	unusable_free += zone_page_state(z, NR_UNACCEPTED);
+#endif
 
 	return unusable_free;
 }
-- 
  Kiryl Shutsemau / Kirill A. Shutemov




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux