Re: [PATCH] mm/page_alloc: add one PCP list for THP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2024/6/20 6:28, Barry Song 写道:
On Thu, Jun 20, 2024 at 12:55 AM <yangge1116@xxxxxxx> wrote:

From: yangge <yangge1116@xxxxxxx>

Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
THP-sized allocations") no longer differentiates the migration type
of pages in THP-sized PCP list, it's possible that non-movable
allocation requests may get a CMA page from the list, in some cases,
it's not acceptable.

If a large number of CMA memory are configured in system (for
example, the CMA memory accounts for 50% of the system memory),
starting a virtual machine with device passthrough will get stuck.
During starting the virtual machine, it will call
pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally
if a page is present and in CMA area, pin_user_pages_remote() will
migrate the page from CMA area to non-CMA area because of
FOLL_LONGTERM flag. But if non-movable allocation requests return
CMA memory, migrate_longterm_unpinnable_pages() will migrate a CMA
page to another CMA page, which will fail to pass the check in
check_and_migrate_movable_pages() and cause migration endless.
Call trace:
pin_user_pages_remote
--__gup_longterm_locked // endless loops in this function
----_get_user_pages_locked
----check_and_migrate_movable_pages
------migrate_longterm_unpinnable_pages
--------alloc_migration_target

This problem will also have a negative impact on CMA itself. For
example, when CMA is borrowed by THP, and we need to reclaim it
through cma_alloc() or dma_alloc_coherent(), we must move those
pages out to ensure CMA's users can retrieve that contigous memory.
Currently, CMA's memory is occupied by non-movable pages, meaning
we can't relocate them. As a result, cma_alloc() is more likely to
fail.

To fix the problem above, we add one PCP list for THP, which will
not introduce a new cacheline for struct per_cpu_pages. THP will
have 2 PCP lists, one PCP list is used by MOVABLE allocation, and
the other PCP list is used by UNMOVABLE allocation. MOVABLE
allocation contains GPF_MOVABLE, and UNMOVABLE allocation contains
GFP_UNMOVABLE and GFP_RECLAIMABLE.

Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations")

Please add the below tag

Cc: <stable@xxxxxxxxxxxxxxx>

And I don't think 'mm/page_alloc: add one PCP list for THP' is a good
title. Maybe:

'mm/page_alloc: Separate THP PCP into movable and non-movable categories'

Whenever you send a new version, please add things like 'PATCH V2', 'PATCH V3'.
You have already missed several version numbers, so we may have to start from V2
though V2 is wrong.


Ok, thanks.

Signed-off-by: yangge <yangge1116@xxxxxxx>
---
  include/linux/mmzone.h | 9 ++++-----
  mm/page_alloc.c        | 9 +++++++--
  2 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index b7546dd..cb7f265 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -656,13 +656,12 @@ enum zone_watermarks {
  };

  /*
- * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional list
- * for THP which will usually be GFP_MOVABLE. Even if it is another type,
- * it should not contribute to serious fragmentation causing THP allocation
- * failures.
+ * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. Two additional lists
+ * are added for THP. One PCP list is used by GPF_MOVABLE, and the other PCP list
+ * is used by GFP_UNMOVABLE and GFP_RECLAIMABLE.
   */
  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-#define NR_PCP_THP 1
+#define NR_PCP_THP 2
  #else
  #define NR_PCP_THP 0
  #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8f416a0..0a837e6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -504,10 +504,15 @@ static void bad_page(struct page *page, const char *reason)

  static inline unsigned int order_to_pindex(int migratetype, int order)
  {
+       bool __maybe_unused movable;
+
  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
         if (order > PAGE_ALLOC_COSTLY_ORDER) {
                 VM_BUG_ON(order != HPAGE_PMD_ORDER);
-               return NR_LOWORDER_PCP_LISTS;
+
+               movable = migratetype == MIGRATE_MOVABLE;
+
+               return NR_LOWORDER_PCP_LISTS + movable;
         }
  #else
         VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
@@ -521,7 +526,7 @@ static inline int pindex_to_order(unsigned int pindex)
         int order = pindex / MIGRATE_PCPTYPES;

  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
-       if (pindex == NR_LOWORDER_PCP_LISTS)
+       if (pindex >= NR_LOWORDER_PCP_LISTS)
                 order = HPAGE_PMD_ORDER;
  #else
         VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER);
--
2.7.4






[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux