Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/8/8 18:57, Daniel Gomez wrote:
On Thu, Aug 08, 2024 at 12:48:52PM GMT, Daniel Gomez wrote:
On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:


On 2024/8/8 16:51, David Hildenbrand wrote:
On 08.08.24 04:36, Baolin Wang wrote:


On 2024/8/7 23:53, David Hildenbrand wrote:
On 07.08.24 09:31, Baolin Wang wrote:
Page reclaim will not scan anon LRU if no swap space, however
MADV_PAGEOUT
can still split shmem large folios even without a swap device. Thus add
swap available space validation before spliting shmem large folio to
avoid redundant split.

Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
    mm/vmscan.c | 8 ++++++++
    1 file changed, 8 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 31d13462571e..796f65781f4f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
                }
            } else if (folio_test_swapbacked(folio) &&
                   folio_test_large(folio)) {
+
+            /*
+             * Do not split shmem folio if no swap memory
+             * available.
+             */
+            if (!total_swap_pages)
+                goto activate_locked;
+
                /* Split shmem folio */
                if (split_folio_to_list(folio, folio_list))
                    goto keep_locked;

Reminds me of

commit 9a976f0c847b67d22ed694556a3626ed92da0422
Author: Luis Chamberlain <mcgrof@xxxxxxxxxx>
Date:   Thu Mar 9 15:05:43 2023 -0800

       shmem: skip page split if we're not reclaiming
       In theory when info->flags & VM_LOCKED we should not be getting
       shem_writepage() called so we should be verifying this with a
       WARN_ON_ONCE().  Since we should not be swapping then best to
ensure we
       also don't do the folio split earlier too.  So just move the check
early
       to avoid folio splits in case its a dubious call.
       We also have a similar early bail when !total_swap_pages so just
move that
       earlier to avoid the possible folio split in the same situation.


But indeed, pageout() -> writepage() is called *after* the split in the
vmscan path.

In that "noswap" context, I wonder if we also want to skip folios part
of shmem
with disabled swapping?

Yes, I think so.


But now I am wondering under which circumstances we end up calling
shmem_writepage() with a large folio. And I think the answer is the
comment of
folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.


... so if shmem_writepage() handles+checks that, could we do

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a332cb80e928..7dfa3d6e8ba7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
list_head *folio_list,
                                                   goto
activate_locked_split;
                                   }
                           }
-               } else if (folio_test_swapbacked(folio) &&
-                          folio_test_large(folio)) {
-                       /* Split shmem folio */
-                       if (split_folio_to_list(folio, folio_list))
-                               goto keep_locked;
                   }

                   /*

instead?

Seems reasonable to me. But we should pass the 'folio_list' to
shmem_writepage() to list the subpages of the large folio. Let me try.

Ah, yes, good point. Alternatively, we'd have to split and try writing
all subpages in there. I wonder what to do if we fail to write some, and
if we could handle that transparently, without the folio_list.

After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
via 'struct writeback_control', which could simplify the logic a lot.
Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
as well as tracking each subpage's write result, which seems more
complicated.

I made a quick change by passing 'folio_list', and it looks simple and works
as expected.

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 75196b0f894f..10100e22d5c6 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -80,6 +80,9 @@ struct writeback_control {
          */
         struct swap_iocb **swap_plug;

+       /* Target list for splitting a large folio */
+       struct list_head *list;
+
         /* internal fields used by the ->writepages implementation: */
         struct folio_batch fbatch;
         pgoff_t index;
diff --git a/mm/shmem.c b/mm/shmem.c
index 05525e9e7423..0a5a68f7d0a0 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
writeback_control *wbc)
          * and its shmem_writeback() needs them to be split when swapping.
          */
         if (wbc->split_large_folio && folio_test_large(folio)) {
+try_split:
                 /* Ensure the subpages are still dirty */
                 folio_test_set_dirty(folio);
-               if (split_huge_page(page) < 0)
+               if (split_huge_page_to_list_to_order(page, wbc->list, 0))

We check for split_large_folio, but we still send the wbc->list for i915
writepage() case. Previously, we were sending a NULL list. Shouldn't we address
that case too?

I guess I was missing wbc initialization snippet:

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index fe69f2c8527d..174b95a9a988 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
                 .range_start = 0,
                 .range_end = LLONG_MAX,
                 .for_reclaim = 1,
+               .list = NULL,
         };
         unsigned long i;


IMO, we don't need an explicit initialization, and 'list' will be initialized as NULL. Please see: https://gcc.gnu.org/onlinedocs/gcc/Designated-Inits.html




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux