Re: [RFC PATCH v2 1/1] mm/vmscan: move the written-back folios to the tail of LRU after shrinking

Barry Song <21cnbao@xxxxxxxxx> · Fri, 29 Nov 2024 12:08:48 +1300

On Mon, Nov 25, 2024 at 2:19 PM chenridong <chenridong@xxxxxxxxxx> wrote:
>
>
>
> On 2024/11/18 12:21, Matthew Wilcox wrote:
> > On Mon, Nov 18, 2024 at 05:14:14PM +1300, Barry Song wrote:
> >> On Mon, Nov 18, 2024 at 5:03 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >>>
> >>> On Sat, Nov 16, 2024 at 09:16:58AM +0000, Chen Ridong wrote:
> >>>> 2. In shrink_page_list function, if folioN is THP(2M), it may be splited
> >>>>    and added to swap cache folio by folio. After adding to swap cache,
> >>>>    it will submit io to writeback folio to swap, which is asynchronous.
> >>>>    When shrink_page_list is finished, the isolated folios list will be
> >>>>    moved back to the head of inactive lru. The inactive lru may just look
> >>>>    like this, with 512 filioes have been move to the head of inactive lru.
> >>>
> >>> I was hoping that we'd be able to stop splitting the folio when adding
> >>> to the swap cache.  Ideally. we'd add the whole 2MB and write it back
> >>> as a single unit.
> >>
> >> This is already the case: adding to the swapcache doesn’t require splitting
> >> THPs, but failing to allocate 2MB of contiguous swap slots will.
> >
> > Agreed we need to understand why this is happening.  As I've said a few
> > times now, we need to stop requiring contiguity.  Real filesystems don't
> > need the contiguity (they become less efficient, but they can scatter a
> > single 2MB folio to multiple places).
> >
> > Maybe Chris has a solution to this in the works?
> >
>
> Hi, Chris, do you have a better idea to solve this issue?

Not Chris. As I read the code again, we have already the below code to fixup
the issue "missed folio_rotate_reclaimable()" in evict_folios():

                /* retry folios that may have missed
folio_rotate_reclaimable() */
                list_move(&folio->lru, &clean);

It doesn't work for you?

commit 359a5e1416caaf9ce28396a65ed3e386cc5de663
Author: Yu Zhao <yuzhao@xxxxxxxxxx>
Date:   Tue Nov 15 18:38:07 2022 -0700
    mm: multi-gen LRU: retry folios written back while isolated

    The page reclaim isolates a batch of folios from the tail of one of the
    LRU lists and works on those folios one by one.  For a suitable
    swap-backed folio, if the swap device is async, it queues that folio for
    writeback.  After the page reclaim finishes an entire batch, it puts back
    the folios it queued for writeback to the head of the original LRU list.

    In the meantime, the page writeback flushes the queued folios also by
    batches.  Its batching logic is independent from that of the page reclaim.
    For each of the folios it writes back, the page writeback calls
    folio_rotate_reclaimable() which tries to rotate a folio to the tail.

    folio_rotate_reclaimable() only works for a folio after the page reclaim
    has put it back.  If an async swap device is fast enough, the page
    writeback can finish with that folio while the page reclaim is still
    working on the rest of the batch containing it.  In this case, that folio
    will remain at the head and the page reclaim will not retry it before
    reaching there.

    This patch adds a retry to evict_folios().  After evict_folios() has
    finished an entire batch and before it puts back folios it cannot free
    immediately, it retries those that may have missed the rotation.
    Before this patch, ~60% of folios swapped to an Intel Optane missed
    folio_rotate_reclaimable().  After this patch, ~99% of missed folios were
    reclaimed upon retry.

    This problem affects relatively slow async swap devices like Samsung 980
    Pro much less and does not affect sync swap devices like zram or zswap at
    all.

>
> Best regards,
> Ridong

Thanks
Barry