Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating non-CMA THP-sized page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2024/6/17 下午8:47, yangge1116 写道:


在 2024/6/17 下午6:26, Barry Song 写道:
On Tue, Jun 4, 2024 at 9:15 PM <yangge1116@xxxxxxx> wrote:

From: yangge <yangge1116@xxxxxxx>

Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
THP-sized allocations") no longer differentiates the migration type
of pages in THP-sized PCP list, it's possible to get a CMA page from
the list, in some cases, it's not acceptable, for example, allocating
a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page.

The patch forbids allocating non-CMA THP-sized page from THP-sized
PCP list to avoid the issue above.

Could you please describe the impact on users in the commit log?

If a large number of CMA memory are configured in the system (for example, the CMA memory accounts for 50% of the system memory), starting virtual machine with device passthrough will get stuck.

During starting virtual machine, it will call pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. If a page is in CMA area, pin_user_pages_remote() will migrate the page from CMA area to non-CMA area because of FOLL_LONGTERM flag. If non-movable allocation requests return CMA memory, pin_user_pages_remote() will enter endless loops.

backtrace:
pin_user_pages_remote
----__gup_longterm_locked //cause endless loops in this function
--------__get_user_pages_locked
--------check_and_migrate_movable_pages //always check fail and continue to migrate
------------migrate_longterm_unpinnable_pages
----------------alloc_migration_target // non-movable allocation

Is it possible that some CMA memory might be used by non-movable
allocation requests?

Yes.


If so, will CMA somehow become unable to migrate, causing cma_alloc() to fail?


No, it will cause endless loops in __gup_longterm_locked(). If non-movable allocation requests return CMA memory, migrate_longterm_unpinnable_pages() will migrate a CMA page to another CMA page, which is useless and cause endless loops in __gup_longterm_locked().

backtrace:
pin_user_pages_remote
----__gup_longterm_locked //cause endless loops in this function
--------__get_user_pages_locked
--------check_and_migrate_movable_pages //always check fail and continue to migrate
------------migrate_longterm_unpinnable_pages






Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations")
Signed-off-by: yangge <yangge1116@xxxxxxx>
---
  mm/page_alloc.c | 10 ++++++++++
  1 file changed, 10 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2e22ce5..0bdf471 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone *preferred_zone,
         WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));

         if (likely(pcp_allowed_order(order))) {
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+               if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA || +                                               order != HPAGE_PMD_ORDER) { +                       page = rmqueue_pcplist(preferred_zone, zone, order, +                                               migratetype, alloc_flags);
+                       if (likely(page))
+                               goto out;
+               }

This seems not ideal, because non-CMA THP gets no chance to use PCP. But it
still seems better than causing the failure of CMA allocation.

Is there a possible approach to avoiding adding CMA THP into pcp from the first
beginning? Otherwise, we might need a separate PCP for CMA.


The vast majority of THP-sized allocations are GFP_MOVABLE, avoiding adding CMA THP into pcp may incur a slight performance penalty.

Commit 1d91df85f399 takes a similar approach to filter, and I mainly refer to it.


+#else
                 page = rmqueue_pcplist(preferred_zone, zone, order,
                                        migratetype, alloc_flags);
                 if (likely(page))
                         goto out;
+#endif
         }

         page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
--
2.7.4

Thanks
Barry






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux