Re: [PATCH mm-unstable v2 6/6] mm/mglru: rework workingset protection

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 6, 2024 at 9:44 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote:
>
> On Thu, Dec 05, 2024 at 05:31:26PM -0700, Yu Zhao wrote:
> > With the aging feedback no longer considering the distribution of
> > folios in each generation, rework workingset protection to better
> > distribute folios across MAX_NR_GENS. This is achieved by reusing
> > PG_workingset and PG_referenced/LRU_REFS_FLAGS in a slightly different
> > way.
> >
> > For folios accessed multiple times through file descriptors, make
> > lru_gen_inc_refs() set additional bits of LRU_REFS_WIDTH in
> > folio->flags after PG_referenced, then PG_workingset after
> > LRU_REFS_WIDTH. After all its bits are set, i.e.,
> > LRU_REFS_FLAGS|BIT(PG_workingset), a folio is lazily promoted into the
> > second oldest generation in the eviction path. And when
> > folio_inc_gen() does that, it clears LRU_REFS_FLAGS so that
> > lru_gen_inc_refs() can start over. For this case, LRU_REFS_MASK is
> > only valid when PG_referenced is set.
> >
> > For folios accessed multiple times through page tables,
> > folio_update_gen() from a page table walk or lru_gen_set_refs() from a
> > rmap walk sets PG_referenced after the accessed bit is cleared for the
> > first time. Thereafter, those two paths set PG_workingset and promote
> > folios to the youngest generation. Like folio_inc_gen(), when
> > folio_update_gen() does that, it also clears PG_referenced. For this
> > case, LRU_REFS_MASK is not used.
> >
> > For both of the cases, after PG_workingset is set on a folio, it
> > remains until this folio is either reclaimed, or "deactivated" by
> > lru_gen_clear_refs(). It can be set again if lru_gen_test_recent()
> > returns true upon a refault.
> >
> > When adding folios to the LRU lists, lru_gen_distance() distributes
> > them as follows:
> > +---------------------------------+---------------------------------+
> > |    Accessed thru page tables    | Accessed thru file descriptors  |
> > +---------------------------------+---------------------------------+
> > | PG_active (set while isolated)  |                                 |
> > +----------------+----------------+----------------+----------------+
> > | PG_workingset  | PG_referenced  | PG_workingset  | LRU_REFS_FLAGS |
> > +---------------------------------+---------------------------------+
> > |<--------- MIN_NR_GENS --------->|                                 |
> > |<-------------------------- MAX_NR_GENS -------------------------->|
> >
> > After this patch, some typical client and server workloads showed
> > improvements under heavy memory pressure. For example, Python TPC-C,
> > which was used to benchmark a different approach [1] to better detect
> > refault distances, showed a significant decrease in total refaults:
> >                             Before      After      Change
> >   Time (seconds)            10801       10801      0%
> >   Executed (transactions)   41472       43663      +5%
> >   workingset_nodes          109070      120244     +10%
> >   workingset_refault_anon   5019627     7281831    +45%
> >   workingset_refault_file   1294678786  554855564  -57%
> >   workingset_refault_total  1299698413  562137395  -57%
> >
> > [1] https://lore.kernel.org/20230920190244.16839-1-ryncsn@xxxxxxxxx/
> >
> > Reported-by: Kairui Song <kasong@xxxxxxxxxxx>
> > Closes: https://lore.kernel.org/CAOUHufahuWcKf5f1Sg3emnqX+cODuR=2TQo7T4Gr-QYLujn4RA@xxxxxxxxxxxxxx/
> > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx>
> > Tested-by: Kalesh Singh <kaleshsingh@xxxxxxxxxx>
> > ---
> >  include/linux/mm_inline.h |  94 +++++++++++++------------
> >  include/linux/mmzone.h    |  82 +++++++++++++---------
> >  mm/swap.c                 |  23 +++---
> >  mm/vmscan.c               | 142 +++++++++++++++++++++++---------------
> >  mm/workingset.c           |  29 ++++----
> >  5 files changed, 209 insertions(+), 161 deletions(-)
>
> Some outlier results from LULESH (Livermore Unstructured Lagrangian
> Explicit Shock Hydrodynamics) [1] caught my eye. The following fix
> made the benchmark a lot happier (128GB DRAM + Optane swap):
>                             Before    After    Change
>   Average (z/s)             6894      7574     +10%
>   Deviation (10 samples)    12.96%    1.76%    -86%
>
> [1] https://asc.llnl.gov/codes/proxy-apps/lulesh
>
> Andrew, can you please fold it in? Thanks!

Never mind. syzbot found another warning. So let me fix that and post v3.

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 90bbc2b3be8b..5e03a61c894f 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -916,8 +916,7 @@ static enum folio_references folio_check_references(struct folio *folio,
>                 if (!referenced_ptes)
>                         return FOLIOREF_RECLAIM;
>
> -               lru_gen_set_refs(folio);
> -               return FOLIOREF_ACTIVATE;
> +               return lru_gen_set_refs(folio) ? FOLIOREF_ACTIVATE : FOLIOREF_KEEP;
>         }
>
>         referenced_folio = folio_test_clear_referenced(folio);
> @@ -4173,11 +4172,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
>                         old_gen = folio_update_gen(folio, new_gen);
>                         if (old_gen >= 0 && old_gen != new_gen)
>                                 update_batch_size(walk, folio, old_gen, new_gen);
> -
> -                       continue;
> -               }
> -
> -               if (lru_gen_set_refs(folio)) {
> +               } else if (lru_gen_set_refs(folio)) {
>                         old_gen = folio_lru_gen(folio);
>                         if (old_gen >= 0 && old_gen != new_gen)
>                                 folio_activate(folio);





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux