Re: [PATCH v4 01/10] mm: vmscan: add validation before spliting shmem large folio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Aug 08, 2024 at 12:48:52PM GMT, Daniel Gomez wrote:
> On Thu, Aug 08, 2024 at 05:34:50PM GMT, Baolin Wang wrote:
> > 
> > 
> > On 2024/8/8 16:51, David Hildenbrand wrote:
> > > On 08.08.24 04:36, Baolin Wang wrote:
> > > > 
> > > > 
> > > > On 2024/8/7 23:53, David Hildenbrand wrote:
> > > > > On 07.08.24 09:31, Baolin Wang wrote:
> > > > > > Page reclaim will not scan anon LRU if no swap space, however
> > > > > > MADV_PAGEOUT
> > > > > > can still split shmem large folios even without a swap device. Thus add
> > > > > > swap available space validation before spliting shmem large folio to
> > > > > > avoid redundant split.
> > > > > > 
> > > > > > Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> > > > > > ---
> > > > > >    mm/vmscan.c | 8 ++++++++
> > > > > >    1 file changed, 8 insertions(+)
> > > > > > 
> > > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > > index 31d13462571e..796f65781f4f 100644
> > > > > > --- a/mm/vmscan.c
> > > > > > +++ b/mm/vmscan.c
> > > > > > @@ -1259,6 +1259,14 @@ static unsigned int shrink_folio_list(struct
> > > > > > list_head *folio_list,
> > > > > >                }
> > > > > >            } else if (folio_test_swapbacked(folio) &&
> > > > > >                   folio_test_large(folio)) {
> > > > > > +
> > > > > > +            /*
> > > > > > +             * Do not split shmem folio if no swap memory
> > > > > > +             * available.
> > > > > > +             */
> > > > > > +            if (!total_swap_pages)
> > > > > > +                goto activate_locked;
> > > > > > +
> > > > > >                /* Split shmem folio */
> > > > > >                if (split_folio_to_list(folio, folio_list))
> > > > > >                    goto keep_locked;
> > > > > 
> > > > > Reminds me of
> > > > > 
> > > > > commit 9a976f0c847b67d22ed694556a3626ed92da0422
> > > > > Author: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> > > > > Date:   Thu Mar 9 15:05:43 2023 -0800
> > > > > 
> > > > >       shmem: skip page split if we're not reclaiming
> > > > >       In theory when info->flags & VM_LOCKED we should not be getting
> > > > >       shem_writepage() called so we should be verifying this with a
> > > > >       WARN_ON_ONCE().  Since we should not be swapping then best to
> > > > > ensure we
> > > > >       also don't do the folio split earlier too.  So just move the check
> > > > > early
> > > > >       to avoid folio splits in case its a dubious call.
> > > > >       We also have a similar early bail when !total_swap_pages so just
> > > > > move that
> > > > >       earlier to avoid the possible folio split in the same situation.
> > > > > 
> > > > > 
> > > > > But indeed, pageout() -> writepage() is called *after* the split in the
> > > > > vmscan path.
> > > > > 
> > > > > In that "noswap" context, I wonder if we also want to skip folios part
> > > > > of shmem
> > > > > with disabled swapping?
> > > > 
> > > > Yes, I think so.
> > > > 
> > > > > 
> > > > > But now I am wondering under which circumstances we end up calling
> > > > > shmem_writepage() with a large folio. And I think the answer is the
> > > > > comment of
> > > > > folio_test_large(): via drivers/gpu/drm/i915/gem/i915_gem_shmem.c.
> > > > > 
> > > > > 
> > > > > ... so if shmem_writepage() handles+checks that, could we do
> > > > > 
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index a332cb80e928..7dfa3d6e8ba7 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -1257,11 +1257,6 @@ static unsigned int shrink_folio_list(struct
> > > > > list_head *folio_list,
> > > > >                                                   goto
> > > > > activate_locked_split;
> > > > >                                   }
> > > > >                           }
> > > > > -               } else if (folio_test_swapbacked(folio) &&
> > > > > -                          folio_test_large(folio)) {
> > > > > -                       /* Split shmem folio */
> > > > > -                       if (split_folio_to_list(folio, folio_list))
> > > > > -                               goto keep_locked;
> > > > >                   }
> > > > > 
> > > > >                   /*
> > > > > 
> > > > > instead?
> > > > 
> > > > Seems reasonable to me. But we should pass the 'folio_list' to
> > > > shmem_writepage() to list the subpages of the large folio. Let me try.
> > > 
> > > Ah, yes, good point. Alternatively, we'd have to split and try writing
> > > all subpages in there. I wonder what to do if we fail to write some, and
> > > if we could handle that transparently, without the folio_list.
> > 
> > After some investigation, I prefer to pass 'folio_list' to shmem_writepage()
> > via 'struct writeback_control', which could simplify the logic a lot.
> > Otherwise, we need to handle each subpage's writeback/reclaim/dirty state,
> > as well as tracking each subpage's write result, which seems more
> > complicated.
> > 
> > I made a quick change by passing 'folio_list', and it looks simple and works
> > as expected.
> > 
> > diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> > index 75196b0f894f..10100e22d5c6 100644
> > --- a/include/linux/writeback.h
> > +++ b/include/linux/writeback.h
> > @@ -80,6 +80,9 @@ struct writeback_control {
> >          */
> >         struct swap_iocb **swap_plug;
> > 
> > +       /* Target list for splitting a large folio */
> > +       struct list_head *list;
> > +
> >         /* internal fields used by the ->writepages implementation: */
> >         struct folio_batch fbatch;
> >         pgoff_t index;
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 05525e9e7423..0a5a68f7d0a0 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1496,9 +1496,10 @@ static int shmem_writepage(struct page *page, struct
> > writeback_control *wbc)
> >          * and its shmem_writeback() needs them to be split when swapping.
> >          */
> >         if (wbc->split_large_folio && folio_test_large(folio)) {
> > +try_split:
> >                 /* Ensure the subpages are still dirty */
> >                 folio_test_set_dirty(folio);
> > -               if (split_huge_page(page) < 0)
> > +               if (split_huge_page_to_list_to_order(page, wbc->list, 0))
> 
> We check for split_large_folio, but we still send the wbc->list for i915
> writepage() case. Previously, we were sending a NULL list. Shouldn't we address
> that case too?

I guess I was missing wbc initialization snippet:

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index fe69f2c8527d..174b95a9a988 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -308,6 +308,7 @@ void __shmem_writeback(size_t size, struct address_space *mapping)
                .range_start = 0,
                .range_end = LLONG_MAX,
                .for_reclaim = 1,
+               .list = NULL,
        };
        unsigned long i;

> >                         goto redirty;
> >                 folio = page_folio(page);
> >                 folio_clear_dirty(folio);
> > @@ -1540,8 +1541,12 @@ static int shmem_writepage(struct page *page, struct
> > writeback_control *wbc)
> >         }
> > 
> >         swap = folio_alloc_swap(folio);
> > -       if (!swap.val)
> > +       if (!swap.val) {
> > +               if (nr_pages > 1)
> > +                       goto try_split;
> > +
> >                 goto redirty;
> > +       }
> > 
> >         /*
> >          * Add inode to shmem_unuse()'s list of swapped-out inodes,
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 277571815789..cf982cf2454f 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -628,7 +628,7 @@ typedef enum {
> >   * Calls ->writepage().
> >   */
> >  static pageout_t pageout(struct folio *folio, struct address_space
> > *mapping,
> > -                        struct swap_iocb **plug)
> > +                        struct swap_iocb **plug, struct list_head
> > *folio_list)
> >  {
> >         /*
> >          * If the folio is dirty, only perform writeback if that write
> > @@ -676,6 +676,11 @@ static pageout_t pageout(struct folio *folio, struct
> > address_space *mapping,
> >                         .swap_plug = plug,
> >                 };
> > 
> > +               if (shmem_mapping(mapping)) {
> > +                       wbc.list = folio_list;
> > +                       wbc.split_large_folio =
> > !IS_ENABLED(CONFIG_THP_SWAP);
> > +               }
> > +
> >                 folio_set_reclaim(folio);
> >                 res = mapping->a_ops->writepage(&folio->page, &wbc);
> >                 if (res < 0)
> > @@ -1259,23 +1264,6 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> >                                                 goto activate_locked_split;
> >                                 }
> >                         }
> > -               } else if (folio_test_swapbacked(folio) &&
> > -                          folio_test_large(folio)) {
> > -
> > -                       /*
> > -                        * Do not split shmem folio if no swap memory
> > -                        * available.
> > -                        */
> > -                       if (!total_swap_pages)
> > -                               goto activate_locked;
> > -
> > -                       /*
> > -                        * Only split shmem folio when CONFIG_THP_SWAP
> > -                        * is not enabled.
> > -                        */
> > -                       if (!IS_ENABLED(CONFIG_THP_SWAP) &&
> > -                           split_folio_to_list(folio, folio_list))
> > -                               goto keep_locked;
> >                 }
> > 
> >                 /*
> > @@ -1377,18 +1365,21 @@ static unsigned int shrink_folio_list(struct
> > list_head *folio_list,
> >                          * starts and then write it out here.
> >                          */
> >                         try_to_unmap_flush_dirty();
> > -try_pageout:
> > -                       switch (pageout(folio, mapping, &plug)) {
> > +                       switch (pageout(folio, mapping, &plug, folio_list))
> > {
> >                         case PAGE_KEEP:
> >                                 goto keep_locked;
> >                         case PAGE_ACTIVATE:
> > -                               if (shmem_mapping(mapping) &&
> > folio_test_large(folio) &&
> > -                                   !split_folio_to_list(folio, folio_list))
> > {
> > +                               /* Shmem can be split when writeback to swap
> > */
> > +                               if ((nr_pages > 1) &&
> > !folio_test_large(folio)) {
> > +                                       sc->nr_scanned -= (nr_pages - 1);
> >                                         nr_pages = 1;
> > -                                       goto try_pageout;
> >                                 }
> >                                 goto activate_locked;
> >                         case PAGE_SUCCESS:
> > +                               if ((nr_pages > 1) &&
> > !folio_test_large(folio)) {
> > +                                       sc->nr_scanned -= (nr_pages - 1);
> > +                                       nr_pages = 1;
> > +                               }
> >                                 stat->nr_pageout += nr_pages;
> > 
> >                                 if (folio_test_writeback(folio)) {




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux