On Fri, Dec 6, 2024 at 9:44 PM Yu Zhao <yuzhao@xxxxxxxxxx> wrote: > > On Thu, Dec 05, 2024 at 05:31:26PM -0700, Yu Zhao wrote: > > With the aging feedback no longer considering the distribution of > > folios in each generation, rework workingset protection to better > > distribute folios across MAX_NR_GENS. This is achieved by reusing > > PG_workingset and PG_referenced/LRU_REFS_FLAGS in a slightly different > > way. > > > > For folios accessed multiple times through file descriptors, make > > lru_gen_inc_refs() set additional bits of LRU_REFS_WIDTH in > > folio->flags after PG_referenced, then PG_workingset after > > LRU_REFS_WIDTH. After all its bits are set, i.e., > > LRU_REFS_FLAGS|BIT(PG_workingset), a folio is lazily promoted into the > > second oldest generation in the eviction path. And when > > folio_inc_gen() does that, it clears LRU_REFS_FLAGS so that > > lru_gen_inc_refs() can start over. For this case, LRU_REFS_MASK is > > only valid when PG_referenced is set. > > > > For folios accessed multiple times through page tables, > > folio_update_gen() from a page table walk or lru_gen_set_refs() from a > > rmap walk sets PG_referenced after the accessed bit is cleared for the > > first time. Thereafter, those two paths set PG_workingset and promote > > folios to the youngest generation. Like folio_inc_gen(), when > > folio_update_gen() does that, it also clears PG_referenced. For this > > case, LRU_REFS_MASK is not used. > > > > For both of the cases, after PG_workingset is set on a folio, it > > remains until this folio is either reclaimed, or "deactivated" by > > lru_gen_clear_refs(). It can be set again if lru_gen_test_recent() > > returns true upon a refault. > > > > When adding folios to the LRU lists, lru_gen_distance() distributes > > them as follows: > > +---------------------------------+---------------------------------+ > > | Accessed thru page tables | Accessed thru file descriptors | > > +---------------------------------+---------------------------------+ > > | PG_active (set while isolated) | | > > +----------------+----------------+----------------+----------------+ > > | PG_workingset | PG_referenced | PG_workingset | LRU_REFS_FLAGS | > > +---------------------------------+---------------------------------+ > > |<--------- MIN_NR_GENS --------->| | > > |<-------------------------- MAX_NR_GENS -------------------------->| > > > > After this patch, some typical client and server workloads showed > > improvements under heavy memory pressure. For example, Python TPC-C, > > which was used to benchmark a different approach [1] to better detect > > refault distances, showed a significant decrease in total refaults: > > Before After Change > > Time (seconds) 10801 10801 0% > > Executed (transactions) 41472 43663 +5% > > workingset_nodes 109070 120244 +10% > > workingset_refault_anon 5019627 7281831 +45% > > workingset_refault_file 1294678786 554855564 -57% > > workingset_refault_total 1299698413 562137395 -57% > > > > [1] https://lore.kernel.org/20230920190244.16839-1-ryncsn@xxxxxxxxx/ > > > > Reported-by: Kairui Song <kasong@xxxxxxxxxxx> > > Closes: https://lore.kernel.org/CAOUHufahuWcKf5f1Sg3emnqX+cODuR=2TQo7T4Gr-QYLujn4RA@xxxxxxxxxxxxxx/ > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > > Tested-by: Kalesh Singh <kaleshsingh@xxxxxxxxxx> > > --- > > include/linux/mm_inline.h | 94 +++++++++++++------------ > > include/linux/mmzone.h | 82 +++++++++++++--------- > > mm/swap.c | 23 +++--- > > mm/vmscan.c | 142 +++++++++++++++++++++++--------------- > > mm/workingset.c | 29 ++++---- > > 5 files changed, 209 insertions(+), 161 deletions(-) > > Some outlier results from LULESH (Livermore Unstructured Lagrangian > Explicit Shock Hydrodynamics) [1] caught my eye. The following fix > made the benchmark a lot happier (128GB DRAM + Optane swap): > Before After Change > Average (z/s) 6894 7574 +10% > Deviation (10 samples) 12.96% 1.76% -86% > > [1] https://asc.llnl.gov/codes/proxy-apps/lulesh > > Andrew, can you please fold it in? Thanks! Never mind. syzbot found another warning. So let me fix that and post v3. > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 90bbc2b3be8b..5e03a61c894f 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -916,8 +916,7 @@ static enum folio_references folio_check_references(struct folio *folio, > if (!referenced_ptes) > return FOLIOREF_RECLAIM; > > - lru_gen_set_refs(folio); > - return FOLIOREF_ACTIVATE; > + return lru_gen_set_refs(folio) ? FOLIOREF_ACTIVATE : FOLIOREF_KEEP; > } > > referenced_folio = folio_test_clear_referenced(folio); > @@ -4173,11 +4172,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) > old_gen = folio_update_gen(folio, new_gen); > if (old_gen >= 0 && old_gen != new_gen) > update_batch_size(walk, folio, old_gen, new_gen); > - > - continue; > - } > - > - if (lru_gen_set_refs(folio)) { > + } else if (lru_gen_set_refs(folio)) { > old_gen = folio_lru_gen(folio); > if (old_gen >= 0 && old_gen != new_gen) > folio_activate(folio);