Re: [PATCH v2] Revert "mm: skip CMA pages when they are not available"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 15, 2024 at 11:44 PM 刘海龙(LaoLiu) <liuhailong@xxxxxxxx> wrote:
>
> On 2024/3/15 16:46, Barry Song wrote:
> > On Fri, Mar 15, 2024 at 9:18 PM <liuhailong@xxxxxxxx> wrote:
> >>
> >> From: "Hailong.Liu" <liuhailong@xxxxxxxx>
> >>
> >> This reverts
> >> commit b7108d66318a ("Multi-gen LRU: skip CMA pages when they are not eligible")
> >> commit 5da226dbfce3 ("mm: skip CMA pages when they are not available")
> >
> > Reverting these changes seems inappropriate since the original patches
> > did improve most cases.
> > While doing direct reclamation without MOVABLE flags, scanning cma
> > folio doesn't help at all.
> >
>
> Absolutely, but without this patch  it give chance to reclaim cma pages to freelist,
> other tasks without this flag would work normally without holding too much time
> on lru lock and lru size decrease。 it also trigger memory psi event to allow
> admin do something to release memory.
>
> >>
> >> skip_cma may cause system not responding. if cma pages is large in lru_list
> >> and system is in lowmemory, many tasks would direct reclaim and waste
> >> cpu time to isolate_lru_pages and return.
> >>
> >> Test this patch on android-5.15 8G device
> >> reproducer:
> >> - cma_declare_contiguous 3G pages
> >> - set /proc/sys/vm/swappiness 0 to enable direct_reclaim reclaim file
> >>   only.
> >> - run a memleak process in userspace
> >
> >
> > I assume that scanning cma folio provides additional opportunities to reclaim
> > anonymous memory, even though it is ineffective for allocating NON-MOVABLE
> > folios. Consequently, we alleviate pressure for future allocations of anonymous
> > folios. Moreover, setting swappiness=0 is a rare scenario. Instead of entirely
> > removing the mechanism, could we detect such corner cases and handle them
> > differently.
> >
> > Otherwise, I don't see any chance of this being acknowledged.
>
> It happens in internal aging test. open camera and record 8K video which use
> more than 2Gib dmabuf pages. without this patch the devices would kill camera process,
> but the system hang with it.
>
> In fact, in my opinion the issue of this patch increase the scanning lru time
> and reclaim nothing. Do you have a better solution ?

rather than aggressively reverting the original patch which might help lots of
cases, could we just detect the corner cases and fix them , for example, by
something as the belows,

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 96147c4536e4..2f75ad9870c1 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -147,6 +147,9 @@ struct scan_control {
        /* Always discard instead of demoting to lower tier memory */
        unsigned int no_demotion:1;

+       /* no skipping cma */
+       unsigned int no_skip_cma:1;
+
        /* Allocation order */
        s8 order;

@@ -1595,7 +1598,7 @@ static __always_inline void
update_lru_sizes(struct lruvec *lruvec,
  */
 static bool skip_cma(struct folio *folio, struct scan_control *sc)
 {
-       return !current_is_kswapd() &&
+       return !current_is_kswapd() && !sc->no_skip_cma &&
                        gfp_migratetype(sc->gfp_mask) != MIGRATE_MOVABLE &&
                        folio_migratetype(folio) == MIGRATE_CMA;
 }
@@ -1636,7 +1639,7 @@ static unsigned long isolate_lru_folios(unsigned
long nr_to_scan,
        unsigned long nr_taken = 0;
        unsigned long nr_zone_taken[MAX_NR_ZONES] = { 0 };
        unsigned long nr_skipped[MAX_NR_ZONES] = { 0, };
-       unsigned long skipped = 0;
+       unsigned long skipped = 0, skipped_cma = 0;
        unsigned long scan, total_scan, nr_pages;
        LIST_HEAD(folios_skipped);

@@ -1645,6 +1648,7 @@ static unsigned long isolate_lru_folios(unsigned
long nr_to_scan,
        while (scan < nr_to_scan && !list_empty(src)) {
                struct list_head *move_to = src;
                struct folio *folio;
+               bool cma = false;

                folio = lru_to_folio(src);
                prefetchw_prev_lru_folio(folio, src, flags);
@@ -1652,10 +1656,11 @@ static unsigned long
isolate_lru_folios(unsigned long nr_to_scan,
                nr_pages = folio_nr_pages(folio);
                total_scan += nr_pages;

-               if (folio_zonenum(folio) > sc->reclaim_idx ||
-                               skip_cma(folio, sc)) {
+               cma = skip_cma(folio, sc);
+               if (folio_zonenum(folio) > sc->reclaim_idx || cma) {
                        nr_skipped[folio_zonenum(folio)] += nr_pages;
                        move_to = &folios_skipped;
+                       skipped_cma += nr_pages;
                        goto move;
                }

@@ -1716,6 +1721,16 @@ static unsigned long
isolate_lru_folios(unsigned long nr_to_scan,
        *nr_scanned = total_scan;
        trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan,
                                    total_scan, skipped, nr_taken, lru);
+
+       /*
+        * If the majority of folios are skipped due to contiguous
memory allocator (CMA),
+        * the least recently used (LRU) list might become overloaded
with CMA folios.
+        * This could result in numerous idle loops for non-movable
allocations. On the
+        * other hand, reclaiming CMA folios can alleviate the strain
on other processes
+        * that require anonymous pages.
+        */
+       sc->no_skip_cma = skipped_cma > total_scan / 2 ? 1 : 0;
+
        update_lru_sizes(lruvec, lru, nr_zone_taken);
        return nr_taken;
 }


This is merely a proof of concept (PoC), and I haven't even built it
yet. Additionally, LRU_GEN remains
unchanged. It will also need similar changes.

>
> Brs,
> Hailong.

Thanks
Barry





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux